Search tips
Search criteria

Results 1-25 (1371254)

Clipboard (0)

Related Articles

1.  Natural selection governs local, but not global, evolutionary gene coexpression networks in Caenorhabditis elegans 
BMC Systems Biology  2008;2:96.
Large-scale evaluation of gene expression variation among Caenorhabditis elegans lines that have diverged from a common ancestor allows for the analysis of a novel class of biological networks – evolutionary gene coexpression networks. Comparative analysis of these evolutionary networks has the potential to uncover the effects of natural selection in shaping coexpression network topologies since C. elegans mutation accumulation (MA) lines evolve essentially free from the effects of natural selection, whereas natural isolate (NI) populations are subject to selective constraints.
We compared evolutionary gene coexpression networks for C. elegans MA lines versus NI populations to evaluate the role that natural selection plays in shaping the evolution of network topologies. MA and NI evolutionary gene coexpression networks were found to have very similar global topological properties as measured by a number of network topological parameters. Observed MA and NI networks show node degree distributions and average values for node degree, clustering coefficient, path length, eccentricity and betweeness that are statistically indistinguishable from one another yet highly distinct from randomly simulated networks. On the other hand, at the local level the MA and NI coexpression networks are highly divergent; pairs of genes coexpressed in the MA versus NI lines are almost entirely different as are the connectivity and clustering properties of individual genes.
It appears that selective forces shape how local patterns of coexpression change over time but do not control the global topology of C. elegans evolutionary gene coexpression networks. These results have implications for the evolutionary significance of global network topologies, which are known to be conserved across disparate complex systems.
PMCID: PMC2596099  PMID: 19014554
2.  Human Gene Coexpression Landscape: Confident Network Derived from Tissue Transcriptomic Profiles 
PLoS ONE  2008;3(12):e3911.
Analysis of gene expression data using genome-wide microarrays is a technique often used in genomic studies to find coexpression patterns and locate groups of co-transcribed genes. However, most studies done at global “omic” scale are not focused on human samples and when they correspond to human very often include heterogeneous datasets, mixing normal with disease-altered samples. Moreover, the technical noise present in genome-wide expression microarrays is another well reported problem that many times is not addressed with robust statistical methods, and the estimation of errors in the data is not provided.
Methodology/Principal Findings
Human genome-wide expression data from a controlled set of normal-healthy tissues is used to build a confident human gene coexpression network avoiding both pathological and technical noise. To achieve this we describe a new method that combines several statistical and computational strategies: robust normalization and expression signal calculation; correlation coefficients obtained by parametric and non-parametric methods; random cross-validations; and estimation of the statistical accuracy and coverage of the data. All these methods provide a series of coexpression datasets where the level of error is measured and can be tuned. To define the errors, the rates of true positives are calculated by assignment to biological pathways. The results provide a confident human gene coexpression network that includes 3327 gene-nodes and 15841 coexpression-links and a comparative analysis shows good improvement over previously published datasets. Further functional analysis of a subset core network, validated by two independent methods, shows coherent biological modules that share common transcription factors. The network reveals a map of coexpression clusters organized in well defined functional constellations. Two major regions in this network correspond to genes involved in nuclear and mitochondrial metabolism and investigations on their functional assignment indicate that more than 60% are house-keeping and essential genes. The network displays new non-described gene associations and it allows the placement in a functional context of some unknown non-assigned genes based on their interactions with known gene families.
The identification of stable and reliable human gene to gene coexpression networks is essential to unravel the interactions and functional correlations between human genes at an omic scale. This work contributes to this aim, and we are making available for the scientific community the validated human gene coexpression networks obtained, to allow further analyses on the network or on some specific gene associations.
The data are available free online at
PMCID: PMC2597745  PMID: 19081792
3.  Geometric Interpretation of Gene Coexpression Network Analysis 
PLoS Computational Biology  2008;4(8):e1000117.
The merging of network theory and microarray data analysis techniques has spawned a new field: gene coexpression network analysis. While network methods are increasingly used in biology, the network vocabulary of computational biologists tends to be far more limited than that of, say, social network theorists. Here we review and propose several potentially useful network concepts. We take advantage of the relationship between network theory and the field of microarray data analysis to clarify the meaning of and the relationship among network concepts in gene coexpression networks. Network theory offers a wealth of intuitive concepts for describing the pairwise relationships among genes, which are depicted in cluster trees and heat maps. Conversely, microarray data analysis techniques (singular value decomposition, tests of differential expression) can also be used to address difficult problems in network theory. We describe conditions when a close relationship exists between network analysis and microarray data analysis techniques, and provide a rough dictionary for translating between the two fields. Using the angular interpretation of correlations, we provide a geometric interpretation of network theoretic concepts and derive unexpected relationships among them. We use the singular value decomposition of module expression data to characterize approximately factorizable gene coexpression networks, i.e., adjacency matrices that factor into node specific contributions. High and low level views of coexpression networks allow us to study the relationships among modules and among module genes, respectively. We characterize coexpression networks where hub genes are significant with respect to a microarray sample trait and show that the network concept of intramodular connectivity can be interpreted as a fuzzy measure of module membership. We illustrate our results using human, mouse, and yeast microarray gene expression data. The unification of coexpression network methods with traditional data mining methods can inform the application and development of systems biologic methods.
Author Summary
Similar to natural languages, network language is ever evolving. While some network terms (concepts) are widely used in gene coexpression network analysis, others still need to be developed to meet the ever increasing demand for describing the system of gene transcripts. There is a need to provide an intuitive geometric explanation of network concepts and to study their relationships. For example, we show that certain seemingly disparate network concepts turn out to be synonyms in the context of coexpression modules. We show how coexpression network language affects our understanding of biology. For example, there are geometric reasons why highly connected hub genes in important coexpression modules tend to be important, and why hub genes in one module cannot be hubs in another distinct module. We provide a short dictionary for translating between microarray data analysis language and network theory language to facilitate communication between the two fields. We describe several examples that illustrate how the two data analysis fields can inform each other.
PMCID: PMC2446438  PMID: 18704157
4.  ZIPK: A Unique Case of Murine-Specific Divergence of a Conserved Vertebrate Gene 
PLoS Genetics  2007;3(10):e180.
Zipper interacting protein kinase (ZIPK, also known as death-associated protein kinase 3 [DAPK3]) is a Ser/Thr kinase that functions in programmed cell death. Since its identification eight years ago, contradictory findings regarding its intracellular localization and molecular mode of action have been reported, which may be attributed to unpredicted differences among the human and rodent orthologs. By aligning the sequences of all available ZIPK orthologs, from fish to human, we discovered that rat and mouse sequences are more diverged from the human ortholog relative to other, more distant, vertebrates. To test experimentally the outcome of this sequence divergence, we compared rat ZIPK to human ZIPK in the same cellular settings. We found that while ectopically expressed human ZIPK localized to the cytoplasm and induced membrane blebbing, rat ZIPK localized exclusively within nuclei, mainly to promyelocytic leukemia oncogenic bodies, and induced significantly lower levels of membrane blebbing. Among the unique murine (rat and mouse) sequence features, we found that a highly conserved phosphorylation site, previously shown to have an effect on the cellular localization of human ZIPK, is absent in murines but not in earlier diverging organisms. Recreating this phosphorylation site in rat ZIPK led to a significant reduction in its promyelocytic leukemia oncogenic body localization, yet did not confer full cytoplasmic localization. Additionally, we found that while rat ZIPK interacts with PAR-4 (also known as PAWR) very efficiently, human ZIPK fails to do so. This interaction has clear functional implications, as coexpression of PAR-4 with rat ZIPK caused nuclear to cytoplasm translocation and induced strong membrane blebbing, thus providing the murine protein a possible adaptive mechanism to compensate for its sequence divergence. We have also cloned zebrafish ZIPK and found that, like the human and unlike the murine orthologs, it localizes to the cytoplasm, and fails to bind the highly conserved PAR-4 protein. This further supports the hypothesis that murine ZIPK underwent specific divergence from a conserved consensus. In conclusion, we present a case of species-specific divergence occurring in a specific branch of the evolutionary tree, accompanied by the acquisition of a unique protein–protein interaction that enables conservation of cellular function.
Author Summary
Mammals are a fairly young class of animals, first appearing about 70 million years ago. Such recent common descent does not allow the evolutionary process to create much diversity within the class, and indeed, the physiology among different mammals is remarkably similar. This similarity enables the use of various small mammals, especially rats and mice, as model systems for the study of biological phenomenon and disease. Experiments unfeasible or unethical to perform on humans are conducted on these model animals, with the postulation that insights gained from them are applicable to the human system. In this article, we present an exception to this rule. We bring evidence that ZIPK, a gene with important roles in programmed cell death, has undergone accelerated evolution in the rat and mouse, thus diverging considerably from a well-conserved consensus in all vertebrates, from fish to man. We also show that this sequence divergence caused changes in the protein's properties, including its localization within the cell, and the proteins with which it interacts. Still, the basic biologic function of ZIPK is conserved in both systems, and we propose an adaptive mechanism that compensates for the sequence divergence in rodents.
PMCID: PMC2041995  PMID: 17953487
5.  Differentially Expressed Genes in Major Depression Reside on the Periphery of Resilient Gene Coexpression Networks 
The structure of gene coexpression networks reflects the activation and interaction of multiple cellular systems. Since the pathology of neuropsychiatric disorders is influenced by diverse cellular systems and pathways, we investigated gene coexpression networks in major depression, and searched for putative unifying themes in network connectivity across neuropsychiatric disorders. Specifically, based on the prevalence of the lethality–centrality relationship in disease-related networks, we hypothesized that network changes between control and major depression-related networks would be centered around coexpression hubs, and secondly, that differentially expressed (DE) genes would have a characteristic position and connectivity level in those networks. Mathematically, the first hypothesis tests the relationship of differential coexpression to network connectivity, while the second “hybrid” expression-and-network hypothesis tests the relationship of differential expression to network connectivity. To answer these questions about the potential interaction of coexpression network structure with differential expression, we utilized all available human post-mortem depression-related datasets appropriate for coexpression analysis, which spanned different microarray platforms, cohorts, and brain regions. Similar studies were also performed in an animal model of depression and in schizophrenia and bipolar disorder microarray datasets. We now provide results which consistently support (1) that genes assemble into small-world and scale-free networks in control subjects, (2) that this efficient network topology is largely resilient to changes in depressed subjects, and (3) that DE genes are positioned on the periphery of coexpression networks. Similar results were observed in a mouse model of depression, and in selected bipolar- and schizophrenia-related networks. Finally, we show that baseline expression variability contributes to the propensity of genes to be network hubs and/or to be DE in disease. In summary, our results suggest that the small-world and scale-free properties of gene networks are resilient to pathological changes in major depression, and that the network structure may constrain the extent to which a gene may be DE in the illness, hence informing further gene-network-based mechanistic studies of neuropsychiatric disorders.
PMCID: PMC3166821  PMID: 21922000
major depression; small-world; scale-free; coexpression; microarray; psychiatry; human post-mortem; graph theory
6.  Rapid and asymmetric divergence of duplicate genes in the human gene coexpression network 
BMC Bioinformatics  2006;7:46.
While gene duplication is known to be one of the most common mechanisms of genome evolution, the fates of genes after duplication are still being debated. In particular, it is presently unknown whether most duplicate genes preserve (or subdivide) the functions of the parental gene or acquire new functions. One aspect of gene function, that is the expression profile in gene coexpression network, has been largely unexplored for duplicate genes.
Here we build a human gene coexpression network using human tissue-specific microarray data and investigate the divergence of duplicate genes in it. The topology of this network is scale-free. Interestingly, our analysis indicates that duplicate genes rapidly lose shared coexpressed partners: after approximately 50 million years since duplication, the two duplicate genes in a pair have only slightly higher number of shared partners as compared with two random singletons. We also show that duplicate gene pairs quickly acquire new coexpressed partners: the average number of partners for a duplicate gene pair is significantly greater than that for a singleton (the latter number can be used as a proxy of the number of partners for a parental singleton gene before duplication). The divergence in gene expression between two duplicates in a pair occurs asymmetrically: one gene usually has more partners than the other one. The network is resilient to both random and degree-based in silico removal of either singletons or duplicate genes. In contrast, the network is especially vulnerable to the removal of highly connected genes when duplicate genes and singletons are considered together.
Duplicate genes rapidly diverge in their expression profiles in the network and play similar role in maintaining the network robustness as compared with singletons.
Supplementary information: Please see additional files.
PMCID: PMC1403810  PMID: 16441884
7.  COXPRESdb: a database of coexpressed gene networks in mammals 
Nucleic Acids Research  2007;36(Database issue):D77-D82.
A database of coexpressed gene sets can provide valuable information for a wide variety of experimental designs, such as targeting of genes for functional identification, gene regulation and/or protein–protein interactions. Coexpressed gene databases derived from publicly available GeneChip data are widely used in Arabidopsis research, but platforms that examine coexpression for higher mammals are rather limited. Therefore, we have constructed a new database, COXPRESdb (coexpressed gene database) (, for coexpressed gene lists and networks in human and mouse. Coexpression data could be calculated for 19 777 and 21 036 genes in human and mouse, respectively, by using the GeneChip data in NCBI GEO. COXPRESdb enables analysis of the four types of coexpression networks: (i) highly coexpressed genes for every gene, (ii) genes with the same GO annotation, (iii) genes expressed in the same tissue and (iv) user-defined gene sets. When the networks became too big for the static picture on the web in GO networks or in tissue networks, we used Google Maps API to visualize them interactively. COXPRESdb also provides a view to compare the human and mouse coexpression patterns to estimate the conservation between the two species.
PMCID: PMC2238883  PMID: 17932064
8.  A Genomewide Functional Network for the Laboratory Mouse 
PLoS Computational Biology  2008;4(9):e1000165.
Establishing a functional network is invaluable to our understanding of gene function, pathways, and systems-level properties of an organism and can be a powerful resource in directing targeted experiments. In this study, we present a functional network for the laboratory mouse based on a Bayesian integration of diverse genetic and functional genomic data. The resulting network includes probabilistic functional linkages among 20,581 protein-coding genes. We show that this network can accurately predict novel functional assignments and network components and present experimental evidence for predictions related to Nanog homeobox (Nanog), a critical gene in mouse embryonic stem cell pluripotency. An analysis of the global topology of the mouse functional network reveals multiple biologically relevant systems-level features of the mouse proteome. Specifically, we identify the clustering coefficient as a critical characteristic of central modulators that affect diverse pathways as well as genes associated with different phenotype traits and diseases. In addition, a cross-species comparison of functional interactomes on a genomic scale revealed distinct functional characteristics of conserved neighborhoods as compared to subnetworks specific to higher organisms. Thus, our global functional network for the laboratory mouse provides the community with a key resource for discovering protein functions and novel pathway components as well as a tool for exploring systems-level topological and evolutionary features of cellular interactomes. To facilitate exploration of this network by the biomedical research community, we illustrate its application in function and disease gene discovery through an interactive, Web-based, publicly available interface at
Author Summary
Functionally related proteins interact in diverse ways to carry out biological processes, and each protein often participates in multiple pathways. Proteins are therefore organized into a complex network through which different functions of the cell are carried out. An accurate description of such a network is invaluable to our understanding of both the system-level features of a cell and those of an individual biological process. In this study, we used a probabilistic model to combine information from diverse genome-scale studies as well as individual investigations to generate a global functional network for mouse. Our analysis of the global topology of this network reveals biologically relevant systems-level characteristics of the mouse proteome, including conservation of functional neighborhoods and network features characteristic of known disease genes and key transcriptional regulators. We have made this network publicly available for search and dynamic exploration by researchers in the community. Our Web interface enables users to easily generate hypotheses regarding potential functional roles of uncharacterized proteins, investigate possible links between their proteins of interest and disease, and identify new players in specific biological processes.
PMCID: PMC2527685  PMID: 18818725
9.  Genetic interactions reveal the evolutionary trajectories of duplicate genes 
Duplicate genes show significantly fewer interactions than singleton genes, and functionally similar duplicates can exhibit dissimilar profiles because common interactions are ‘hidden' due to buffering.Genetic interaction profiles provide insights into evolutionary mechanisms of duplicate retention by distinguishing duplicates under dosage selection from those retained because of some divergence in function.The genetic interactions of duplicate genes evolve in an extremely asymmetric way and the directionality of this asymmetry correlates well with other evolutionary properties of duplicate genes.Genetic interaction profiles can be used to elucidate the divergent function of specific duplicate pairs.
Gene duplication and divergence serves as a primary source for new genes and new functions, and as such has broad implications on the evolutionary process. Duplicate genes within S. cerevisiae have been shown to retain a high degree of similarity with regard to many of their functional properties (Papp et al, 2004; Guan et al, 2007; Wapinski et al, 2007; Musso et al, 2008), and perturbation of duplicate genes has been shown to result in smaller fitness defects than singleton genes (Gu et al, 2003; DeLuna et al, 2008; Dean et al, 2008; Musso et al, 2008). Individual genetic interactions between pairs of genes and profiles of such interactions across the entire genome provide a new context in which to examine the properties of duplicate compensation.
In this study we use the most recent and comprehensive set of genetic interactions in yeast produced to date (Costanzo et al, 2010) to address questions of duplicate retention and redundancy. We show that the ability for duplicate genes to buffer the deletion of a partner has three main consequences. First it agrees with previous work demonstrating that a high proportion of duplicate pairs are synthetic lethal, a classic indication of the ability to buffer one another functionally (DeLuna et al, 2008; Dean et al, 2008; Musso et al, 2008). Second, it reduces the number of genetic interactions observed between duplicate genes and the rest of the genome by masking interactions relating to common function from experimental detection. Third, this buffering of common interactions serves to reduce profile similarity in spite of common function (Figure 1). The compensatory ability of functionally similar duplicates buffers genetic interactions related to their common function (reducing the number of genetic interactions overall), while allowing the measurement of interactions related to any divergent function. Thus, even functionally similar duplicates may have dissimilar genetic interaction profiles. As previously surmised (Ihmels et al, 2007), duplicate genes under selection for dosage amplification have differing profile characteristics. We show that dosage-mediated duplicates have much higher genetic interaction profile similarity than do other duplicate pairs. Furthermore, we show in a comparison with local neighbors on a protein–protein interaction (PPI) network, that although dosage-mediated duplicates more often have higher similarity to each other than they do to their neighbors, the reverse is true for duplicates in general. That is, slightly divergent duplicate genes more often exhibit a higher similarity with a common neighbor on the PPI network than they do with each other, and that observation is consistent with the idea that common interactions are buffered while interactions corresponding to divergent functions are observed.
We then asked whether duplicates' genetic interactions that are not buffered appear in a symmetric or an asymmetric fashion. Previous work has established asymmetric patterns with regard to PPI degree (Wagner, 2002; He and Zhang, 2005), sequence divergence (Conant and Wagner, 2003; Zhang et al, 2003; Kellis et al, 2004; Scannell and Wolfe, 2008) and expression patterns (Gu et al, 2002b; Tirosh and Barkai, 2007). Although genetic interactions are further removed from mechanism than protein–protein interactions, for example, they do offer a more direct measurement of functional consequence and, thus, may give a better indication of the functional differences between a duplicate pair. We found that duplicates exhibit a strikingly asymmetric pattern of genetic interactions, with the ratio of interactions between sisters commonly exceeding 7:1 (Figure 4A). The observations differ significantly from random simulations in which genetic interactions were redistributed between sisters with equal probability (Figure 4A). Moreover, the directionality of this interaction asymmetry agrees with other physiological properties of duplicate pairs. For example, the sister with more genetic interactions also tends to have more protein–protein interactions and also tends to evolve at a slower rate (Figure 4B).
Genetic interaction degree and profiles can be used to understand the functional divergence of particular duplicates pairs. As a case example, we consider the whole-genome-duplication pair CIK1–VIK1. Each of these genes encode proteins that form distinct heterodimeric complexes with the microtubule motor protein Kar3 (Manning et al, 1999). Although each of these proteins depend on a direct physical interaction with Kar3, Cik1 has a much higher profile similarity to Kar3 than does Vik1 (r=0.5 and r=0.3, respectively). Consistent with its higher similarity, Δcik1 and Δkar3 exhibit several similar phenotypes, including abnormally short spindles, chromosome loss and delayed cell cycle progression (Page et al, 1994; Manning et al, 1999). In contrast, a Δvik1 mutant strain exhibits no overt phenotype (Manning et al, 1999).
The characterization of functional redundancy and divergence between duplicate genes is an important step in understanding the evolution of genetic systems. Large-scale genetic network analysis in Saccharomyces cerevisiae provides a powerful perspective for addressing these questions through quantitative measurements of genetic interactions between pairs of duplicated genes, and more generally, through the study of genome-wide genetic interaction profiles associated with duplicated genes. We show that duplicate genes exhibit fewer genetic interactions than other genes because they tend to buffer one another functionally, whereas observed interactions are non-overlapping and reflect their divergent roles. We also show that duplicate gene pairs are highly imbalanced in their number of genetic interactions with other genes, a pattern that appears to result from asymmetric evolution, such that one duplicate evolves or degrades faster than the other and often becomes functionally or conditionally specialized. The differences in genetic interactions are predictive of differences in several other evolutionary and physiological properties of duplicate pairs.
PMCID: PMC3010121  PMID: 21081923
duplicate genes; functional divergence; genetic interactions; paralogs; Saccharomyces cerevisiae
10.  Meta-analysis of Inter-species Liver Co-expression Networks Elucidates Traits Associated with Common Human Diseases 
PLoS Computational Biology  2009;5(12):e1000616.
Co-expression networks are routinely used to study human diseases like obesity and diabetes. Systematic comparison of these networks between species has the potential to elucidate common mechanisms that are conserved between human and rodent species, as well as those that are species-specific characterizing evolutionary plasticity. We developed a semi-parametric meta-analysis approach for combining gene-gene co-expression relationships across expression profile datasets from multiple species. The simulation results showed that the semi-parametric method is robust against noise. When applied to human, mouse, and rat liver co-expression networks, our method out-performed existing methods in identifying gene pairs with coherent biological functions. We identified a network conserved across species that highlighted cell-cell signaling, cell-adhesion and sterol biosynthesis as main biological processes represented in genome-wide association study candidate gene sets for blood lipid levels. We further developed a heterogeneity statistic to test for network differences among multiple datasets, and demonstrated that genes with species-specific interactions tend to be under positive selection throughout evolution. Finally, we identified a human-specific sub-network regulated by RXRG, which has been validated to play a different role in hyperlipidemia and Type 2 diabetes between human and mouse. Taken together, our approach represents a novel step forward in integrating gene co-expression networks from multiple large scale datasets to leverage not only common information but also differences that are dataset-specific.
Author Summary
Two important aspects of drug development are drug target identification and biomarker discovery for early disease detection, disease progression, drug efficacy and drug toxicity, etc. Recently, many single nucleotide polymorphisms (SNPs) associated with human diseases are discovered through large genome-wide association studies (GWAS). However, it is still largely unclear how these candidate SNPs may cause human diseases. The ultimate aim of this paper is to put these GWAS candidate SNPs and their associated genes into a network context to understand their mechanism of action in human diseases. In addition to large-scale human data sets that are often heterogeneous in terms of genetic and environmental factors, many high quality data sets in rodents exist and are frequently used to model human diseases. To leverage such information, we developed a method for combining and contrasting gene networks between human and rodents, specifically to elucidate how GWAS candidate SNPs may contribute to human diseases. By identifying mechanisms that are conserved or divergent between human and rodents, we can also predict which disease causal genes can be studied using rodent models and which ones may not.
PMCID: PMC2787626  PMID: 20019805
11.  Is My Network Module Preserved and Reproducible? 
PLoS Computational Biology  2011;7(1):e1001057.
In many applications, one is interested in determining which of the properties of a network module change across conditions. For example, to validate the existence of a module, it is desirable to show that it is reproducible (or preserved) in an independent test network. Here we study several types of network preservation statistics that do not require a module assignment in the test network. We distinguish network preservation statistics by the type of the underlying network. Some preservation statistics are defined for a general network (defined by an adjacency matrix) while others are only defined for a correlation network (constructed on the basis of pairwise correlations between numeric variables). Our applications show that the correlation structure facilitates the definition of particularly powerful module preservation statistics. We illustrate that evaluating module preservation is in general different from evaluating cluster preservation. We find that it is advantageous to aggregate multiple preservation statistics into summary preservation statistics. We illustrate the use of these methods in six gene co-expression network applications including 1) preservation of cholesterol biosynthesis pathway in mouse tissues, 2) comparison of human and chimpanzee brain networks, 3) preservation of selected KEGG pathways between human and chimpanzee brain networks, 4) sex differences in human cortical networks, 5) sex differences in mouse liver networks. While we find no evidence for sex specific modules in human cortical networks, we find that several human cortical modules are less preserved in chimpanzees. In particular, apoptosis genes are differentially co-expressed between humans and chimpanzees. Our simulation studies and applications show that module preservation statistics are useful for studying differences between the modular structure of networks. Data, R software and accompanying tutorials can be downloaded from the following webpage:
Author Summary
In network applications, one is often interested in studying whether modules are preserved across multiple networks. For example, to determine whether a pathway of genes is perturbed in a certain condition, one can study whether its connectivity pattern is no longer preserved. Non-preserved modules can either be biologically uninteresting (e.g., reflecting data outliers) or interesting (e.g., reflecting sex specific modules). An intuitive approach for studying module preservation is to cross-tabulate module membership. But this approach often cannot address questions about the preservation of connectivity patterns between nodes. Thus, cross-tabulation based approaches often fail to recognize that important aspects of a network module are preserved. Cross-tabulation methods make it difficult to argue that a module is not preserved. The weak statement (“the reference module does not overlap with any of the identified test set modules”) is less relevant in practice than the strong statement (“the module cannot be found in the test network irrespective of the parameter settings of the module detection procedure”). Module preservation statistics have important applications, e.g. we show that the wiring of apoptosis genes in a human cortical network differs from that in chimpanzees.
PMCID: PMC3024255  PMID: 21283776
12.  Construction and use of gene expression covariation matrix 
BMC Bioinformatics  2009;10:214.
One essential step in the massive analysis of transcriptomic profiles is the calculation of the correlation coefficient, a value used to select pairs of genes with similar or inverse transcriptional profiles across a large fraction of the biological conditions examined. Until now, the choice between the two available methods for calculating the coefficient has been dictated mainly by technological considerations. Specifically, in analyses based on double-channel techniques, researchers have been required to use covariation correlation, i.e. the correlation between gene expression changes measured between several pairs of biological conditions, expressed for example as fold-change. In contrast, in analyses of single-channel techniques scientists have been restricted to the use of coexpression correlation, i.e. correlation between gene expression levels. To our knowledge, nobody has ever examined the possible benefits of using covariation instead of coexpression in massive analyses of single channel microarray results.
We describe here how single-channel techniques can be treated like double-channel techniques and used to generate both gene expression changes and covariation measures. We also present a new method that allows the calculation of both positive and negative correlation coefficients between genes. First, we perform systematic comparisons between two given biological conditions and classify, for each comparison, genes as increased (I), decreased (D), or not changed (N). As a result, the original series of n gene expression level measures assigned to each gene is replaced by an ordered string of n(n-1)/2 symbols, e.g. IDDNNIDID....DNNNNNNID, with the length of the string corresponding to the number of comparisons. In a second step, positive and negative covariation matrices (CVM) are constructed by calculating statistically significant positive or negative correlation scores for any pair of genes by comparing their strings of symbols.
This new method, applied to four different large data sets, has allowed us to construct distinct covariation matrices with similar properties. We have also developed a technique to translate these covariation networks into graphical 3D representations and found that the local assignation of the probe sets was conserved across the four chip set models used which encompass three different species (humans, mice, and rats). The application of adapted clustering methods succeeded in delineating six conserved functional regions that we characterized using Gene Ontology information.
PMCID: PMC2720390  PMID: 19594909
13.  Transcriptional dynamics of a conserved gene expression network associated with craniofacial divergence in Arctic charr 
EvoDevo  2014;5(1):40.
Understanding the molecular basis of craniofacial variation can provide insights into key developmental mechanisms of adaptive changes and their role in trophic divergence and speciation. Arctic charr (Salvelinus alpinus) is a polymorphic fish species, and, in Lake Thingvallavatn in Iceland, four sympatric morphs have evolved distinct craniofacial structures. We conducted a gene expression study on candidates from a conserved gene coexpression network, focusing on the development of craniofacial elements in embryos of two contrasting Arctic charr morphotypes (benthic and limnetic).
Four Arctic charr morphs were studied: one limnetic and two benthic morphs from Lake Thingvallavatn and a limnetic reference aquaculture morph. The presence of morphological differences at developmental stages before the onset of feeding was verified by morphometric analysis. Following up on our previous findings that Mmp2 and Sparc were differentially expressed between morphotypes, we identified a network of genes with conserved coexpression across diverse vertebrate species. A comparative expression study of candidates from this network in developing heads of the four Arctic charr morphs verified the coexpression relationship of these genes and revealed distinct transcriptional dynamics strongly correlated with contrasting craniofacial morphologies (benthic versus limnetic). A literature review and Gene Ontology analysis indicated that a significant proportion of the network genes play a role in extracellular matrix organization and skeletogenesis, and motif enrichment analysis of conserved noncoding regions of network candidates predicted a handful of transcription factors, including Ap1 and Ets2, as potential regulators of the gene network. The expression of Ets2 itself was also found to associate with network gene expression. Genes linked to glucocorticoid signalling were also studied, as both Mmp2 and Sparc are responsive to this pathway. Among those, several transcriptional targets and upstream regulators showed differential expression between the contrasting morphotypes. Interestingly, although selected network genes showed overlapping expression patterns in situ and no morph differences, Timp2 expression patterns differed between morphs.
Our comparative study of transcriptional dynamics in divergent craniofacial morphologies of Arctic charr revealed a conserved network of coexpressed genes sharing functional roles in structural morphogenesis. We also implicate transcriptional regulators of the network as targets for future functional studies.
Electronic supplementary material
The online version of this article (doi:10.1186/2041-9139-5-40) contains supplementary material, which is available to authorized users.
PMCID: PMC4240837  PMID: 25419450
Arctic charr; Coexpression; Craniofacial development; Divergent evolution; Gene network; Morphogenesis; Salvelinus alpinus
14.  Connectivity in the Yeast Cell Cycle Transcription Network: Inferences from Neural Networks 
PLoS Computational Biology  2006;2(12):e169.
A current challenge is to develop computational approaches to infer gene network regulatory relationships based on multiple types of large-scale functional genomic data. We find that single-layer feed-forward artificial neural network (ANN) models can effectively discover gene network structure by integrating global in vivo protein:DNA interaction data (ChIP/Array) with genome-wide microarray RNA data. We test this on the yeast cell cycle transcription network, which is composed of several hundred genes with phase-specific RNA outputs. These ANNs were robust to noise in data and to a variety of perturbations. They reliably identified and ranked 10 of 12 known major cell cycle factors at the top of a set of 204, based on a sum-of-squared weights metric. Comparative analysis of motif occurrences among multiple yeast species independently confirmed relationships inferred from ANN weights analysis. ANN models can capitalize on properties of biological gene networks that other kinds of models do not. ANNs naturally take advantage of patterns of absence, as well as presence, of factor binding associated with specific expression output; they are easily subjected to in silico “mutation” to uncover biological redundancies; and they can use the full range of factor binding values. A prominent feature of cell cycle ANNs suggested an analogous property might exist in the biological network. This postulated that “network-local discrimination” occurs when regulatory connections (here between MBF and target genes) are explicitly disfavored in one network module (G2), relative to others and to the class of genes outside the mitotic network. If correct, this predicts that MBF motifs will be significantly depleted from the discriminated class and that the discrimination will persist through evolution. Analysis of distantly related Schizosaccharomyces pombe confirmed this, suggesting that network-local discrimination is real and complements well-known enrichment of MBF sites in G1 class genes.
A current challenge is to develop computational approaches to infer gene network regulatory relationships by integrating multiple types of large-scale functional genomic data. This paper shows that simple artificial neural networks (ANNs) employed in a new way do this very well. The ANN models are well-suited to capitalize on natural properties of gene networks in ways that many previous methods do not. Resulting gene network connections inferred between transcription factors and RNA output patterns are robust to noise in large-scale input datasets and to differences in RNA clustering class inputs. This was shown by using the yeast cell cycle gene network as a test case. The cycle has multiple classes of oscillatory RNAs, and Hart, Mjolsness, and Wold show that the ANNs identify key connections that associate genes from each cell cycle phase group with known and candidate regulators. Comparative analysis of network connectivity across multiple genomes showed strong conservation of basic factor-to-output relationships, although at the greatest evolutionary distances the specific target genes have mainly changed identity.
PMCID: PMC1761652  PMID: 17194216
15.  Empirical Multiscale Networks of Cellular Regulation 
PLoS Computational Biology  2007;3(10):e207.
Grouping genes by similarity of expression across multiple cellular conditions enables the identification of cellular modules. The known functions of genes enable the characterization of the aggregate biological functions of these modules. In this paper, we use a high-throughput approach to identify the effective mutual regulatory interactions between modules composed of mouse genes from the Alliance for Cell Signaling (AfCS) murine B-lymphocyte database which tracks the response of ∼15,000 genes following chemokine perturbation. This analysis reveals principles of cellular organization that we discuss along four conceptual axes. (1) Regulatory implications: the derived collection of influences between any two modules quantifies intuitive as well as unexpected regulatory interactions. (2) Behavior across scales: trends across global networks of varying resolution (composed of various numbers of modules) reveal principles of assembly of high-level behaviors from smaller components. (3) Temporal behavior: tracking the mutual module influences over different time intervals provides features of regulation dynamics such as duration, persistence, and periodicity. (4) Gene Ontology correspondence: the association of modules to known biological roles of individual genes describes the organization of functions within coexpressed modules of various sizes. We present key specific results in each of these four areas, as well as derive general principles of cellular organization. At the coarsest scale, the entire transcriptional network contains five divisions: two divisions devoted to ATP production/biosynthesis and DNA replication that activate all other divisions, an “extracellular interaction” division that represses all other divisions, and two divisions (proliferation/differentiation and membrane infrastructure) that activate and repress other divisions in specific ways consistent with cell cycle control.
Author Summary
In a eukaryotic organism such as the mouse, the complete transcriptional network contains ∼15,000 genes and up to 225 million regulatory relationships between pairs of genes. Determining all of these relationships is currently intractable using traditional experimental techniques, and, thus, a comprehensive description of the entire mouse transcriptional network is elusive. Alternatively, one can apply the limited amount of experimental data to determine the entire transcriptional network at a less detailed, higher level. This is analogous to considering a map of the world resolved to the kilometer rather than to the millimeter. Here, we derive from mouse microarray data several high-scale transcriptional networks by determining the mutual effective regulatory influences of large modules of genes. In particular, global transcriptional networks containing 12 to 72 modules are derived, and analysis of these multiscale networks reveals properties of the transcriptional network that are universal at all scales (e.g., maintenance of homeostasis) and properties that vary as a function of scale (e.g., the fractions of module pairs that exert mutual regulation). In addition, we describe how cellular functions associated with large modules (those containing many genes) are composed of more specific functions associated with smaller modules.
PMCID: PMC2041980  PMID: 17953478
16.  Meta-analysis of gene coexpression networks in the post-mortem prefrontal cortex of patients with schizophrenia and unaffected controls 
BMC Neuroscience  2013;14:105.
Gene expression profiling of the postmortem human brain is part of the effort to understand the neuropathological underpinnings of schizophrenia. Existing microarray studies have identified a large number of genes as candidates, but efforts to generate an integrated view of molecular and cellular changes underlying the illness are few. Here, we have applied a novel approach to combining coexpression data across seven postmortem human brain studies of schizophrenia.
We generated separate coexpression networks for the control and schizophrenia prefrontal cortex and found that differences in global network properties were small. We analyzed gene coexpression relationships of previously identified differentially expressed ‘schizophrenia genes’. Evaluation of network properties revealed differences for the up- and down-regulated ‘schizophrenia genes’, with clustering coefficient displaying particularly interesting trends. We identified modules of coexpressed genes in each network and characterized them according to disease association and cell type specificity. Functional enrichment analysis of modules in each network revealed that genes with altered expression in schizophrenia associate with modules representing biological processes such as oxidative phosphorylation, myelination, synaptic transmission and immune function. Although a immune-function enriched module was found in both networks, many of the genes in the modules were different. Specifically, a decrease in clustering of immune activation genes in the schizophrenia network was coupled with the loss of various astrocyte marker genes and the schizophrenia candidate genes.
Our novel network-based approach for evaluating gene coexpression provides results that converge with existing evidence from genetic and genomic studies to support an immunological link to the pathophysiology of schizophrenia.
PMCID: PMC3849476  PMID: 24070017
Schizophrenia; Microarray; Gene coexpression network; Postmortem brain
17.  Systematic survey reveals general applicability of "guilt-by-association" within gene coexpression networks 
BMC Bioinformatics  2005;6:227.
Biological processes are carried out by coordinated modules of interacting molecules. As clustering methods demonstrate that genes with similar expression display increased likelihood of being associated with a common functional module, networks of coexpressed genes provide one framework for assigning gene function. This has informed the guilt-by-association (GBA) heuristic, widely invoked in functional genomics. Yet although the idea of GBA is accepted, the breadth of GBA applicability is uncertain.
We developed methods to systematically explore the breadth of GBA across a large and varied corpus of expression data to answer the following question: To what extent is the GBA heuristic broadly applicable to the transcriptome and conversely how broadly is GBA captured by a priori knowledge represented in the Gene Ontology (GO)? Our study provides an investigation of the functional organization of five coexpression networks using data from three mammalian organisms. Our method calculates a probabilistic score between each gene and each Gene Ontology category that reflects coexpression enrichment of a GO module. For each GO category we use Receiver Operating Curves to assess whether these probabilistic scores reflect GBA. This methodology applied to five different coexpression networks demonstrates that the signature of guilt-by-association is ubiquitous and reproducible and that the GBA heuristic is broadly applicable across the population of nine hundred Gene Ontology categories. We also demonstrate the existence of highly reproducible patterns of coexpression between some pairs of GO categories.
We conclude that GBA has universal value and that transcriptional control may be more modular than previously realized. Our analyses also suggest that methodologies combining coexpression measurements across multiple genes in a biologically-defined module can aid in characterizing gene function or in characterizing whether pairs of functions operate together.
PMCID: PMC1239911  PMID: 16162296
18.  A methodology for the analysis of differential coexpression across the human lifespan 
BMC Bioinformatics  2009;10:306.
Differential coexpression is a change in coexpression between genes that may reflect 'rewiring' of transcriptional networks. It has previously been hypothesized that such changes might be occurring over time in the lifespan of an organism. While both coexpression and differential expression of genes have been previously studied in life stage change or aging, differential coexpression has not. Generalizing differential coexpression analysis to many time points presents a methodological challenge. Here we introduce a method for analyzing changes in coexpression across multiple ordered groups (e.g., over time) and extensively test its validity and usefulness.
Our method is based on the use of the Haar basis set to efficiently represent changes in coexpression at multiple time scales, and thus represents a principled and generalizable extension of the idea of differential coexpression to life stage data. We used published microarray studies categorized by age to test the methodology. We validated the methodology by testing our ability to reconstruct Gene Ontology (GO) categories using our measure of differential coexpression and compared this result to using coexpression alone. Our method allows significant improvement in characterizing these groups of genes. Further, we examine the statistical properties of our measure of differential coexpression and establish that the results are significant both statistically and by an improvement in semantic similarity. In addition, we found that our method finds more significant changes in gene relationships compared to several other methods of expressing temporal relationships between genes, such as coexpression over time.
Differential coexpression over age generates significant and biologically relevant information about the genes producing it. Our Haar basis methodology for determining age-related differential coexpression performs better than other tested methods. The Haar basis set also lends itself to ready interpretation in terms of both evolutionary and physiological mechanisms of aging and can be seen as a natural generalization of two-category differential coexpression.
PMCID: PMC2761903  PMID: 19772654
19.  Of Mice and Men: Divergence of Gene Expression Patterns in Kidney 
PLoS ONE  2012;7(10):e46876.
Since the development of methods for homologous gene recombination, mouse models have played a central role in research in renal pathophysiology. However, many published and unpublished results show that mice with genetic changes mimicking human pathogenic mutations do not display the human phenotype. These functional differences may stem from differences in gene expression between mouse and human kidneys. However, large scale comparison of gene expression networks revealed conservation of gene expression among a large panel of human and mouse tissues including kidneys. Because renal functions result from the spatial integration of elementary processes originating in the glomerulus and the successive segments constituting the nephron, we hypothesized that differences in gene expression profiles along the human and mouse nephron might account for different behaviors. Analysis of SAGE libraries generated from the glomerulus and seven anatomically defined nephron segments from human and mouse kidneys allowed us to identify 4644 pairs of gene orthologs expressed in either one or both species. Quantitative analysis shows that many transcripts are present at different levels in the two species. It also shows poor conservation of gene expression profiles, with less than 10% of the 4644 gene orthologs displaying a higher conservation of expression profiles than the neutral expectation (p<0.05). Accordingly, hierarchical clustering reveals a higher degree of conservation of gene expression patterns between functionally unrelated kidney structures within a given species than between cognate structures from the two species. Similar findings were obtained for sub-groups of genes with either kidney-specific or housekeeping functions. Conservation of gene expression at the scale of the whole organ and divergence at the level of its constituting sub-structures likely account for the fact that although kidneys assume the same global function in the two species, many mouse “models” of human pathologies do not display the expected phenotype.
PMCID: PMC3463552  PMID: 23056504
20.  Altered Chromatin Occupancy of Master Regulators Underlies Evolutionary Divergence in the Transcriptional Landscape of Erythroid Differentiation 
PLoS Genetics  2014;10(12):e1004890.
Erythropoiesis is one of the best understood examples of cellular differentiation. Morphologically, erythroid differentiation proceeds in a nearly identical fashion between humans and mice, but recent evidence has shown that networks of gene expression governing this process are divergent between species. We undertook a systematic comparative analysis of six histone modifications and four transcriptional master regulators in primary proerythroblasts and erythroid cell lines to better understand the underlying basis of these transcriptional differences. Our analyses suggest that while chromatin structure across orthologous promoters is strongly conserved, subtle differences are associated with transcriptional divergence between species. Many transcription factor (TF) occupancy sites were poorly conserved across species (∼25% for GATA1, TAL1, and NFE2) but were more conserved between proerythroblasts and cell lines derived from the same species. We found that certain cis-regulatory modules co-occupied by GATA1, TAL1, and KLF1 are under strict evolutionary constraint and localize to genes necessary for erythroid cell identity. More generally, we show that conserved TF occupancy sites are indicative of active regulatory regions and strong gene expression that is sustained during maturation. Our results suggest that evolutionary turnover of TF binding sites associates with changes in the underlying chromatin structure, driving transcriptional divergence. We provide examples of how this framework can be applied to understand epigenomic variation in specific regulatory regions, such as the β-globin gene locus. Our findings have important implications for understanding epigenomic changes that mediate variation in cellular differentiation across species, while also providing a valuable resource for studies of hematopoiesis.
Author Summary
The process whereby blood progenitor cells differentiate into red blood cells, known as erythropoiesis, is very similar between mice and humans. Yet, while studies of this process in mouse have substantially improved our knowledge of human erythropoiesis, recent work has shown a significant divergence in global gene expression across species, suggesting that extrapolation from mouse models to human is not always straightforward. In order to better understand these differences, we have performed a comparative epigenomic analysis of six histone modifications and four master transcription factors. By globally comparing chromatin structure across primary cells and model cell lines in both species, we discovered that while chromatin structure is well conserved at orthologous promoters, subtle changes are predictive of species-specific gene expression. Furthermore, we discovered that the genomic localizations of master transcription factors are poorly conserved, and species-specific losses or gains are associated with changes to the underlying chromatin structure and concomitant gene expression. By using our comparative epigenomics framework, we identified a putative human-specific cis-regulatory module that drives expression of human, but not mouse, GDF15, a gene implicated in iron homeostasis. Our results provide a resource to aid researchers in interpreting genetic and epigenetic differences between species.
PMCID: PMC4270484  PMID: 25521328
21.  Using Likelihood-Free Inference to Compare Evolutionary Dynamics of the Protein Networks of H. pylori and P. falciparum 
PLoS Computational Biology  2007;3(11):e230.
Gene duplication with subsequent interaction divergence is one of the primary driving forces in the evolution of genetic systems. Yet little is known about the precise mechanisms and the role of duplication divergence in the evolution of protein networks from the prokaryote and eukaryote domains. We developed a novel, model-based approach for Bayesian inference on biological network data that centres on approximate Bayesian computation, or likelihood-free inference. Instead of computing the intractable likelihood of the protein network topology, our method summarizes key features of the network and, based on these, uses a MCMC algorithm to approximate the posterior distribution of the model parameters. This allowed us to reliably fit a flexible mixture model that captures hallmarks of evolution by gene duplication and subfunctionalization to protein interaction network data of Helicobacter pylori and Plasmodium falciparum. The 80% credible intervals for the duplication–divergence component are [0.64, 0.98] for H. pylori and [0.87, 0.99] for P. falciparum. The remaining parameter estimates are not inconsistent with sequence data. An extensive sensitivity analysis showed that incompleteness of PIN data does not largely affect the analysis of models of protein network evolution, and that the degree sequence alone barely captures the evolutionary footprints of protein networks relative to other statistics. Our likelihood-free inference approach enables a fully Bayesian analysis of a complex and highly stochastic system that is otherwise intractable at present. Modelling the evolutionary history of PIN data, it transpires that only the simultaneous analysis of several global aspects of protein networks enables credible and consistent inference to be made from available datasets. Our results indicate that gene duplication has played a larger part in the network evolution of the eukaryote than in the prokaryote, and suggests that single gene duplications with immediate divergence alone may explain more than 60% of biological network data in both domains.
Author Summary
The importance of gene duplication to biological evolution has been recognized since the 1930s. For more than a decade, substantial evidence has been collected from genomic sequence data in order to elucidate the importance and the mechanisms of gene duplication; however, most biological characteristics arise from complex interactions between the cell's numerous constituents. Recently, preliminary descriptions of the protein interaction networks have become available for species of different domains. Adapting novel techniques in stochastic simulation, the authors demonstrate that evolutionary inferences can be drawn from large-scale, incomplete network data by fitting a stochastic model of network growth that captures hallmarks of evolution by duplication and divergence. They have also analyzed the effect of summarizing protein networks in different ways, and show that a reliable and consistent analysis requires many aspects of network data to be considered jointly; in contrast to what is commonly done in practice. Their results indicate that duplication and divergence has played a larger role in the network evolution of the eukaryote P. falciparum than in the prokaryote H. pylori, and emphasize at least for the eukaryote the potential importance of subfunctionalization in network evolution.
PMCID: PMC2098858  PMID: 18052538
22.  Seed selection strategy in global network alignment without destroying the entire structures of functional modules 
Proteome Science  2012;10(Suppl 1):S16.
Network alignment is one of the most common biological network comparison methods. Aligning protein-protein interaction (PPI) networks of different species is of great important to detect evolutionary conserved pathways or protein complexes across species through the identification of conserved interactions, and to improve our insight into biological systems. Global network alignment (GNA) problem is NP-complete, for which only heuristic methods have been proposed so far. Generally, the current GNA methods fall into global heuristic seed-and-extend approaches. These methods can not get the best overall consistent alignment between networks for the opinionated local seed. Furthermore These methods are lost in maximizing the number of aligned edges between two networks without considering the original structures of functional modules.
We present a novel seed selection strategy for global network alignment by constructing the pairs of hub nodes of networks to be aligned into multiple seeds. Beginning from every hub seed and using the membership similarity of nodes to quantify to what extent the nodes can participate in functional modules associated with current seed topologically we align the networks by modules. By this way we can maintain the functional modules are not damaged during the heuristic alignment process. And our method is efficient in resolving the fatal problem of most conventional algorithms that the initialization selected seeds have a direct influence on the alignment result. The similarity measures between network nodes (e.g., proteins) include sequence similarity, centrality similarity, and dynamic membership similarity and our algorithm can be called Multiple Hubs-based Alignment (MHA).
When applying our seed selection strategy to several pairs of real PPI networks, it is observed that our method is working to strike a balance, extending the conserved interactions while maintaining the functional modules unchanged. In the case study, we assess the effectiveness of MHA on the alignment of the yeast and fly PPI networks. Our method outperforms state-of-the-art algorithms at detecting conserved functional modules and retrieves in particular 86% more conserved interactions than IsoRank.
We believe that our seed selection strategy will lead us to obtain more topologically and biologically similar alignment result. And it can be used as the reference and complement of other heuristic methods to seek more meaningful alignment results.
PMCID: PMC3380727  PMID: 22759574
23.  GraphCrunch 2: Software tool for network modeling, alignment and clustering 
BMC Bioinformatics  2011;12:24.
Recent advancements in experimental biotechnology have produced large amounts of protein-protein interaction (PPI) data. The topology of PPI networks is believed to have a strong link to their function. Hence, the abundance of PPI data for many organisms stimulates the development of computational techniques for the modeling, comparison, alignment, and clustering of networks. In addition, finding representative models for PPI networks will improve our understanding of the cell just as a model of gravity has helped us understand planetary motion. To decide if a model is representative, we need quantitative comparisons of model networks to real ones. However, exact network comparison is computationally intractable and therefore several heuristics have been used instead. Some of these heuristics are easily computable "network properties," such as the degree distribution, or the clustering coefficient. An important special case of network comparison is the network alignment problem. Analogous to sequence alignment, this problem asks to find the "best" mapping between regions in two networks. It is expected that network alignment might have as strong an impact on our understanding of biology as sequence alignment has had. Topology-based clustering of nodes in PPI networks is another example of an important network analysis problem that can uncover relationships between interaction patterns and phenotype.
We introduce the GraphCrunch 2 software tool, which addresses these problems. It is a significant extension of GraphCrunch which implements the most popular random network models and compares them with the data networks with respect to many network properties. Also, GraphCrunch 2 implements the GRAph ALigner algorithm ("GRAAL") for purely topological network alignment. GRAAL can align any pair of networks and exposes large, dense, contiguous regions of topological and functional similarities far larger than any other existing tool. Finally, GraphCruch 2 implements an algorithm for clustering nodes within a network based solely on their topological similarities. Using GraphCrunch 2, we demonstrate that eukaryotic and viral PPI networks may belong to different graph model families and show that topology-based clustering can reveal important functional similarities between proteins within yeast and human PPI networks.
GraphCrunch 2 is a software tool that implements the latest research on biological network analysis. It parallelizes computationally intensive tasks to fully utilize the potential of modern multi-core CPUs. It is open-source and freely available for research use. It runs under the Windows and Linux platforms.
PMCID: PMC3036622  PMID: 21244715
24.  Global network alignment using multiscale spectral signatures 
Bioinformatics  2012;28(23):3105-3114.
Motivation: Protein interaction networks provide an important system-level view of biological processes. One of the fundamental problems in biological network analysis is the global alignment of a pair of networks, which puts the proteins of one network into correspondence with the proteins of another network in a manner that conserves their interactions while respecting other evidence of their homology. By providing a mapping between the networks of different species, alignments can be used to inform hypotheses about the functions of unannotated proteins, the existence of unobserved interactions, the evolutionary divergence between the two species and the evolution of complexes and pathways.
Results: We introduce GHOST, a global pairwise network aligner that uses a novel spectral signature to measure topological similarity between subnetworks. It combines a seed-and-extend global alignment phase with a local search procedure and exceeds state-of-the-art performance on several network alignment tasks. We show that the spectral signature used by GHOST is highly discriminative, whereas the alignments it produces are also robust to experimental noise. When compared with other recent approaches, we find that GHOST is able to recover larger and more biologically significant, shared subnetworks between species.
Availability: An efficient and parallelized implementation of GHOST, released under the Apache 2.0 license, is available at
PMCID: PMC3509496  PMID: 23047556
25.  Gene Coexpression Network Topology of Cardiac Development, Hypertrophy, and Failure 
Network analysis techniques allow a more accurate reflection of underlying systems biology to be realized than traditional unidimensional molecular biology approaches. Here, using gene coexpression network analysis, we define the gene expression network topology of cardiac hypertrophy and failure and the extent of recapitulation of fetal gene expression programs in failing and hypertrophied adult myocardium.
Methods and Results
We assembled all myocardial transcript data in the Gene Expression Omnibus (n = 1617). Since hierarchical analysis revealed species had primacy over disease clustering, we focused this analysis on the most complete (murine) dataset (n = 478). Using gene coexpression network analysis, we derived functional modules, regulatory mediators and higher order topological relationships between genes and identified 50 gene co-expression modules in developing myocardium that were not present in normal adult tissue. We found that known gene expression markers of myocardial adaptation were members of upregulated modules but not hub genes. We identified ZIC2 as a novel transcription factor associated with coexpression modules common to developing and failing myocardium. Of 50 fetal gene co-expression modules, three (6%) were reproduced in hypertrophied myocardium and seven (14%) were reproduced in failing myocardium. One fetal module was common to both failing and hypertrophied myocardium.
Network modeling allows systems analysis of cardiovascular development and disease. While we did not find evidence for a global coordinated program of fetal gene expression in adult myocardial adaptation, our analysis revealed specific gene expression modules active during both development and disease and specific candidates for their regulation.
PMCID: PMC3324316  PMID: 21127201
fetal; gene expression; heart failure; hypertrophy; myocardium

Results 1-25 (1371254)