Search tips
Search criteria

Results 1-25 (1330236)

Clipboard (0)

Related Articles

1.  Natural selection governs local, but not global, evolutionary gene coexpression networks in Caenorhabditis elegans 
BMC Systems Biology  2008;2:96.
Large-scale evaluation of gene expression variation among Caenorhabditis elegans lines that have diverged from a common ancestor allows for the analysis of a novel class of biological networks – evolutionary gene coexpression networks. Comparative analysis of these evolutionary networks has the potential to uncover the effects of natural selection in shaping coexpression network topologies since C. elegans mutation accumulation (MA) lines evolve essentially free from the effects of natural selection, whereas natural isolate (NI) populations are subject to selective constraints.
We compared evolutionary gene coexpression networks for C. elegans MA lines versus NI populations to evaluate the role that natural selection plays in shaping the evolution of network topologies. MA and NI evolutionary gene coexpression networks were found to have very similar global topological properties as measured by a number of network topological parameters. Observed MA and NI networks show node degree distributions and average values for node degree, clustering coefficient, path length, eccentricity and betweeness that are statistically indistinguishable from one another yet highly distinct from randomly simulated networks. On the other hand, at the local level the MA and NI coexpression networks are highly divergent; pairs of genes coexpressed in the MA versus NI lines are almost entirely different as are the connectivity and clustering properties of individual genes.
It appears that selective forces shape how local patterns of coexpression change over time but do not control the global topology of C. elegans evolutionary gene coexpression networks. These results have implications for the evolutionary significance of global network topologies, which are known to be conserved across disparate complex systems.
PMCID: PMC2596099  PMID: 19014554
2.  Human Gene Coexpression Landscape: Confident Network Derived from Tissue Transcriptomic Profiles 
PLoS ONE  2008;3(12):e3911.
Analysis of gene expression data using genome-wide microarrays is a technique often used in genomic studies to find coexpression patterns and locate groups of co-transcribed genes. However, most studies done at global “omic” scale are not focused on human samples and when they correspond to human very often include heterogeneous datasets, mixing normal with disease-altered samples. Moreover, the technical noise present in genome-wide expression microarrays is another well reported problem that many times is not addressed with robust statistical methods, and the estimation of errors in the data is not provided.
Methodology/Principal Findings
Human genome-wide expression data from a controlled set of normal-healthy tissues is used to build a confident human gene coexpression network avoiding both pathological and technical noise. To achieve this we describe a new method that combines several statistical and computational strategies: robust normalization and expression signal calculation; correlation coefficients obtained by parametric and non-parametric methods; random cross-validations; and estimation of the statistical accuracy and coverage of the data. All these methods provide a series of coexpression datasets where the level of error is measured and can be tuned. To define the errors, the rates of true positives are calculated by assignment to biological pathways. The results provide a confident human gene coexpression network that includes 3327 gene-nodes and 15841 coexpression-links and a comparative analysis shows good improvement over previously published datasets. Further functional analysis of a subset core network, validated by two independent methods, shows coherent biological modules that share common transcription factors. The network reveals a map of coexpression clusters organized in well defined functional constellations. Two major regions in this network correspond to genes involved in nuclear and mitochondrial metabolism and investigations on their functional assignment indicate that more than 60% are house-keeping and essential genes. The network displays new non-described gene associations and it allows the placement in a functional context of some unknown non-assigned genes based on their interactions with known gene families.
The identification of stable and reliable human gene to gene coexpression networks is essential to unravel the interactions and functional correlations between human genes at an omic scale. This work contributes to this aim, and we are making available for the scientific community the validated human gene coexpression networks obtained, to allow further analyses on the network or on some specific gene associations.
The data are available free online at
PMCID: PMC2597745  PMID: 19081792
3.  ZIPK: A Unique Case of Murine-Specific Divergence of a Conserved Vertebrate Gene 
PLoS Genetics  2007;3(10):e180.
Zipper interacting protein kinase (ZIPK, also known as death-associated protein kinase 3 [DAPK3]) is a Ser/Thr kinase that functions in programmed cell death. Since its identification eight years ago, contradictory findings regarding its intracellular localization and molecular mode of action have been reported, which may be attributed to unpredicted differences among the human and rodent orthologs. By aligning the sequences of all available ZIPK orthologs, from fish to human, we discovered that rat and mouse sequences are more diverged from the human ortholog relative to other, more distant, vertebrates. To test experimentally the outcome of this sequence divergence, we compared rat ZIPK to human ZIPK in the same cellular settings. We found that while ectopically expressed human ZIPK localized to the cytoplasm and induced membrane blebbing, rat ZIPK localized exclusively within nuclei, mainly to promyelocytic leukemia oncogenic bodies, and induced significantly lower levels of membrane blebbing. Among the unique murine (rat and mouse) sequence features, we found that a highly conserved phosphorylation site, previously shown to have an effect on the cellular localization of human ZIPK, is absent in murines but not in earlier diverging organisms. Recreating this phosphorylation site in rat ZIPK led to a significant reduction in its promyelocytic leukemia oncogenic body localization, yet did not confer full cytoplasmic localization. Additionally, we found that while rat ZIPK interacts with PAR-4 (also known as PAWR) very efficiently, human ZIPK fails to do so. This interaction has clear functional implications, as coexpression of PAR-4 with rat ZIPK caused nuclear to cytoplasm translocation and induced strong membrane blebbing, thus providing the murine protein a possible adaptive mechanism to compensate for its sequence divergence. We have also cloned zebrafish ZIPK and found that, like the human and unlike the murine orthologs, it localizes to the cytoplasm, and fails to bind the highly conserved PAR-4 protein. This further supports the hypothesis that murine ZIPK underwent specific divergence from a conserved consensus. In conclusion, we present a case of species-specific divergence occurring in a specific branch of the evolutionary tree, accompanied by the acquisition of a unique protein–protein interaction that enables conservation of cellular function.
Author Summary
Mammals are a fairly young class of animals, first appearing about 70 million years ago. Such recent common descent does not allow the evolutionary process to create much diversity within the class, and indeed, the physiology among different mammals is remarkably similar. This similarity enables the use of various small mammals, especially rats and mice, as model systems for the study of biological phenomenon and disease. Experiments unfeasible or unethical to perform on humans are conducted on these model animals, with the postulation that insights gained from them are applicable to the human system. In this article, we present an exception to this rule. We bring evidence that ZIPK, a gene with important roles in programmed cell death, has undergone accelerated evolution in the rat and mouse, thus diverging considerably from a well-conserved consensus in all vertebrates, from fish to man. We also show that this sequence divergence caused changes in the protein's properties, including its localization within the cell, and the proteins with which it interacts. Still, the basic biologic function of ZIPK is conserved in both systems, and we propose an adaptive mechanism that compensates for the sequence divergence in rodents.
PMCID: PMC2041995  PMID: 17953487
4.  Geometric Interpretation of Gene Coexpression Network Analysis 
PLoS Computational Biology  2008;4(8):e1000117.
The merging of network theory and microarray data analysis techniques has spawned a new field: gene coexpression network analysis. While network methods are increasingly used in biology, the network vocabulary of computational biologists tends to be far more limited than that of, say, social network theorists. Here we review and propose several potentially useful network concepts. We take advantage of the relationship between network theory and the field of microarray data analysis to clarify the meaning of and the relationship among network concepts in gene coexpression networks. Network theory offers a wealth of intuitive concepts for describing the pairwise relationships among genes, which are depicted in cluster trees and heat maps. Conversely, microarray data analysis techniques (singular value decomposition, tests of differential expression) can also be used to address difficult problems in network theory. We describe conditions when a close relationship exists between network analysis and microarray data analysis techniques, and provide a rough dictionary for translating between the two fields. Using the angular interpretation of correlations, we provide a geometric interpretation of network theoretic concepts and derive unexpected relationships among them. We use the singular value decomposition of module expression data to characterize approximately factorizable gene coexpression networks, i.e., adjacency matrices that factor into node specific contributions. High and low level views of coexpression networks allow us to study the relationships among modules and among module genes, respectively. We characterize coexpression networks where hub genes are significant with respect to a microarray sample trait and show that the network concept of intramodular connectivity can be interpreted as a fuzzy measure of module membership. We illustrate our results using human, mouse, and yeast microarray gene expression data. The unification of coexpression network methods with traditional data mining methods can inform the application and development of systems biologic methods.
Author Summary
Similar to natural languages, network language is ever evolving. While some network terms (concepts) are widely used in gene coexpression network analysis, others still need to be developed to meet the ever increasing demand for describing the system of gene transcripts. There is a need to provide an intuitive geometric explanation of network concepts and to study their relationships. For example, we show that certain seemingly disparate network concepts turn out to be synonyms in the context of coexpression modules. We show how coexpression network language affects our understanding of biology. For example, there are geometric reasons why highly connected hub genes in important coexpression modules tend to be important, and why hub genes in one module cannot be hubs in another distinct module. We provide a short dictionary for translating between microarray data analysis language and network theory language to facilitate communication between the two fields. We describe several examples that illustrate how the two data analysis fields can inform each other.
PMCID: PMC2446438  PMID: 18704157
5.  Differentially Expressed Genes in Major Depression Reside on the Periphery of Resilient Gene Coexpression Networks 
The structure of gene coexpression networks reflects the activation and interaction of multiple cellular systems. Since the pathology of neuropsychiatric disorders is influenced by diverse cellular systems and pathways, we investigated gene coexpression networks in major depression, and searched for putative unifying themes in network connectivity across neuropsychiatric disorders. Specifically, based on the prevalence of the lethality–centrality relationship in disease-related networks, we hypothesized that network changes between control and major depression-related networks would be centered around coexpression hubs, and secondly, that differentially expressed (DE) genes would have a characteristic position and connectivity level in those networks. Mathematically, the first hypothesis tests the relationship of differential coexpression to network connectivity, while the second “hybrid” expression-and-network hypothesis tests the relationship of differential expression to network connectivity. To answer these questions about the potential interaction of coexpression network structure with differential expression, we utilized all available human post-mortem depression-related datasets appropriate for coexpression analysis, which spanned different microarray platforms, cohorts, and brain regions. Similar studies were also performed in an animal model of depression and in schizophrenia and bipolar disorder microarray datasets. We now provide results which consistently support (1) that genes assemble into small-world and scale-free networks in control subjects, (2) that this efficient network topology is largely resilient to changes in depressed subjects, and (3) that DE genes are positioned on the periphery of coexpression networks. Similar results were observed in a mouse model of depression, and in selected bipolar- and schizophrenia-related networks. Finally, we show that baseline expression variability contributes to the propensity of genes to be network hubs and/or to be DE in disease. In summary, our results suggest that the small-world and scale-free properties of gene networks are resilient to pathological changes in major depression, and that the network structure may constrain the extent to which a gene may be DE in the illness, hence informing further gene-network-based mechanistic studies of neuropsychiatric disorders.
PMCID: PMC3166821  PMID: 21922000
major depression; small-world; scale-free; coexpression; microarray; psychiatry; human post-mortem; graph theory
6.  COXPRESdb: a database of coexpressed gene networks in mammals 
Nucleic Acids Research  2007;36(Database issue):D77-D82.
A database of coexpressed gene sets can provide valuable information for a wide variety of experimental designs, such as targeting of genes for functional identification, gene regulation and/or protein–protein interactions. Coexpressed gene databases derived from publicly available GeneChip data are widely used in Arabidopsis research, but platforms that examine coexpression for higher mammals are rather limited. Therefore, we have constructed a new database, COXPRESdb (coexpressed gene database) (, for coexpressed gene lists and networks in human and mouse. Coexpression data could be calculated for 19 777 and 21 036 genes in human and mouse, respectively, by using the GeneChip data in NCBI GEO. COXPRESdb enables analysis of the four types of coexpression networks: (i) highly coexpressed genes for every gene, (ii) genes with the same GO annotation, (iii) genes expressed in the same tissue and (iv) user-defined gene sets. When the networks became too big for the static picture on the web in GO networks or in tissue networks, we used Google Maps API to visualize them interactively. COXPRESdb also provides a view to compare the human and mouse coexpression patterns to estimate the conservation between the two species.
PMCID: PMC2238883  PMID: 17932064
7.  Rapid and asymmetric divergence of duplicate genes in the human gene coexpression network 
BMC Bioinformatics  2006;7:46.
While gene duplication is known to be one of the most common mechanisms of genome evolution, the fates of genes after duplication are still being debated. In particular, it is presently unknown whether most duplicate genes preserve (or subdivide) the functions of the parental gene or acquire new functions. One aspect of gene function, that is the expression profile in gene coexpression network, has been largely unexplored for duplicate genes.
Here we build a human gene coexpression network using human tissue-specific microarray data and investigate the divergence of duplicate genes in it. The topology of this network is scale-free. Interestingly, our analysis indicates that duplicate genes rapidly lose shared coexpressed partners: after approximately 50 million years since duplication, the two duplicate genes in a pair have only slightly higher number of shared partners as compared with two random singletons. We also show that duplicate gene pairs quickly acquire new coexpressed partners: the average number of partners for a duplicate gene pair is significantly greater than that for a singleton (the latter number can be used as a proxy of the number of partners for a parental singleton gene before duplication). The divergence in gene expression between two duplicates in a pair occurs asymmetrically: one gene usually has more partners than the other one. The network is resilient to both random and degree-based in silico removal of either singletons or duplicate genes. In contrast, the network is especially vulnerable to the removal of highly connected genes when duplicate genes and singletons are considered together.
Duplicate genes rapidly diverge in their expression profiles in the network and play similar role in maintaining the network robustness as compared with singletons.
Supplementary information: Please see additional files.
PMCID: PMC1403810  PMID: 16441884
8.  Cosplicing network analysis of mammalian brain RNA-Seq data utilizing WGCNA and Mantel correlations 
Frontiers in Genetics  2015;6:174.
Across species and tissues and especially in the mammalian brain, production of gene isoforms is widespread. While gene expression coordination has been previously described as a scale-free coexpression network, the properties of transcriptome-wide isoform production coordination have been less studied. Here we evaluate the system-level properties of cosplicing in mouse, macaque, and human brain gene expression data using a novel network inference procedure. Genes are represented as vectors/lists of exon counts and distance measures sensitive to exon inclusion rates quantifies differences across samples. For all gene pairs, distance matrices are correlated across samples, resulting in cosplicing or cotranscriptional network matrices. We show that networks including cosplicing information are scale-free and distinct from coexpression. In the networks capturing cosplicing we find a set of novel hubs with unique characteristics distinguishing them from coexpression hubs: heavy representation in neurobiological functional pathways, strong overlap with markers of neurons and neuroglia, long coding lengths, and high number of both exons and annotated transcripts. Further, the cosplicing hubs are enriched in genes associated with autism spectrum disorders. Cosplicing hub homologs across eukaryotes show dramatically increasing intronic lengths but stable coding region lengths. Shared transcription factor binding sites increase coexpression but not cosplicing; the reverse is true for splicing-factor binding sites. Genes with protein-protein interactions have strong coexpression and cosplicing. Additional factors affecting the networks include shared microRNA binding sites, spatial colocalization within the striatum, and sharing a chromosomal folding domain. Cosplicing network patterns remain relatively stable across species.
PMCID: PMC4429622  PMID: 26029240
gene cosplicing; scale-free gene networks; brain transcriptome; alternative splicing; gene coexpression
9.  A Genomewide Functional Network for the Laboratory Mouse 
PLoS Computational Biology  2008;4(9):e1000165.
Establishing a functional network is invaluable to our understanding of gene function, pathways, and systems-level properties of an organism and can be a powerful resource in directing targeted experiments. In this study, we present a functional network for the laboratory mouse based on a Bayesian integration of diverse genetic and functional genomic data. The resulting network includes probabilistic functional linkages among 20,581 protein-coding genes. We show that this network can accurately predict novel functional assignments and network components and present experimental evidence for predictions related to Nanog homeobox (Nanog), a critical gene in mouse embryonic stem cell pluripotency. An analysis of the global topology of the mouse functional network reveals multiple biologically relevant systems-level features of the mouse proteome. Specifically, we identify the clustering coefficient as a critical characteristic of central modulators that affect diverse pathways as well as genes associated with different phenotype traits and diseases. In addition, a cross-species comparison of functional interactomes on a genomic scale revealed distinct functional characteristics of conserved neighborhoods as compared to subnetworks specific to higher organisms. Thus, our global functional network for the laboratory mouse provides the community with a key resource for discovering protein functions and novel pathway components as well as a tool for exploring systems-level topological and evolutionary features of cellular interactomes. To facilitate exploration of this network by the biomedical research community, we illustrate its application in function and disease gene discovery through an interactive, Web-based, publicly available interface at
Author Summary
Functionally related proteins interact in diverse ways to carry out biological processes, and each protein often participates in multiple pathways. Proteins are therefore organized into a complex network through which different functions of the cell are carried out. An accurate description of such a network is invaluable to our understanding of both the system-level features of a cell and those of an individual biological process. In this study, we used a probabilistic model to combine information from diverse genome-scale studies as well as individual investigations to generate a global functional network for mouse. Our analysis of the global topology of this network reveals biologically relevant systems-level characteristics of the mouse proteome, including conservation of functional neighborhoods and network features characteristic of known disease genes and key transcriptional regulators. We have made this network publicly available for search and dynamic exploration by researchers in the community. Our Web interface enables users to easily generate hypotheses regarding potential functional roles of uncharacterized proteins, investigate possible links between their proteins of interest and disease, and identify new players in specific biological processes.
PMCID: PMC2527685  PMID: 18818725
10.  Comparison of Gene Coexpression Profiles and Construction of Conserved Gene Networks to Find Functional Modules 
PLoS ONE  2015;10(7):e0132039.
Computational approaches toward gene annotation are a formidable challenge, now that many genome sequences have been determined. Each gene has its own function, but complicated cellular functions are achieved by sets of genes. Therefore, sets of genes with strong functional relationships must be identified. For this purpose, the similarities of gene expression patterns and gene sequences have been separately utilized, although the combined information will provide a better solution.
Result & Discussion
We propose a new method to find functional modules, by comparing gene coexpression profiles among species. A coexpression pattern is represented as a list of coexpressed genes with each guide gene. We compared two coexpression lists, one from a human guide gene and the other from a homologous mouse gene, and defined a measure to evaluate the similarity between the lists. Based on this coexpression similarity, we detected the highly conserved genes, and constructed human gene networks with conserved coexpression between human and mouse. Some of the tightly coupled genes (modules) showed clear functional enrichment, such as immune system and cell cycle, indicating that our method could identify functionally related genes without any prior knowledge. We also found a few functional modules without any annotations, which may be good candidates for novel functional modules. All of the comparisons are available at the web database.
PMCID: PMC4493118  PMID: 26147120
11.  Genetic interactions reveal the evolutionary trajectories of duplicate genes 
Duplicate genes show significantly fewer interactions than singleton genes, and functionally similar duplicates can exhibit dissimilar profiles because common interactions are ‘hidden' due to buffering.Genetic interaction profiles provide insights into evolutionary mechanisms of duplicate retention by distinguishing duplicates under dosage selection from those retained because of some divergence in function.The genetic interactions of duplicate genes evolve in an extremely asymmetric way and the directionality of this asymmetry correlates well with other evolutionary properties of duplicate genes.Genetic interaction profiles can be used to elucidate the divergent function of specific duplicate pairs.
Gene duplication and divergence serves as a primary source for new genes and new functions, and as such has broad implications on the evolutionary process. Duplicate genes within S. cerevisiae have been shown to retain a high degree of similarity with regard to many of their functional properties (Papp et al, 2004; Guan et al, 2007; Wapinski et al, 2007; Musso et al, 2008), and perturbation of duplicate genes has been shown to result in smaller fitness defects than singleton genes (Gu et al, 2003; DeLuna et al, 2008; Dean et al, 2008; Musso et al, 2008). Individual genetic interactions between pairs of genes and profiles of such interactions across the entire genome provide a new context in which to examine the properties of duplicate compensation.
In this study we use the most recent and comprehensive set of genetic interactions in yeast produced to date (Costanzo et al, 2010) to address questions of duplicate retention and redundancy. We show that the ability for duplicate genes to buffer the deletion of a partner has three main consequences. First it agrees with previous work demonstrating that a high proportion of duplicate pairs are synthetic lethal, a classic indication of the ability to buffer one another functionally (DeLuna et al, 2008; Dean et al, 2008; Musso et al, 2008). Second, it reduces the number of genetic interactions observed between duplicate genes and the rest of the genome by masking interactions relating to common function from experimental detection. Third, this buffering of common interactions serves to reduce profile similarity in spite of common function (Figure 1). The compensatory ability of functionally similar duplicates buffers genetic interactions related to their common function (reducing the number of genetic interactions overall), while allowing the measurement of interactions related to any divergent function. Thus, even functionally similar duplicates may have dissimilar genetic interaction profiles. As previously surmised (Ihmels et al, 2007), duplicate genes under selection for dosage amplification have differing profile characteristics. We show that dosage-mediated duplicates have much higher genetic interaction profile similarity than do other duplicate pairs. Furthermore, we show in a comparison with local neighbors on a protein–protein interaction (PPI) network, that although dosage-mediated duplicates more often have higher similarity to each other than they do to their neighbors, the reverse is true for duplicates in general. That is, slightly divergent duplicate genes more often exhibit a higher similarity with a common neighbor on the PPI network than they do with each other, and that observation is consistent with the idea that common interactions are buffered while interactions corresponding to divergent functions are observed.
We then asked whether duplicates' genetic interactions that are not buffered appear in a symmetric or an asymmetric fashion. Previous work has established asymmetric patterns with regard to PPI degree (Wagner, 2002; He and Zhang, 2005), sequence divergence (Conant and Wagner, 2003; Zhang et al, 2003; Kellis et al, 2004; Scannell and Wolfe, 2008) and expression patterns (Gu et al, 2002b; Tirosh and Barkai, 2007). Although genetic interactions are further removed from mechanism than protein–protein interactions, for example, they do offer a more direct measurement of functional consequence and, thus, may give a better indication of the functional differences between a duplicate pair. We found that duplicates exhibit a strikingly asymmetric pattern of genetic interactions, with the ratio of interactions between sisters commonly exceeding 7:1 (Figure 4A). The observations differ significantly from random simulations in which genetic interactions were redistributed between sisters with equal probability (Figure 4A). Moreover, the directionality of this interaction asymmetry agrees with other physiological properties of duplicate pairs. For example, the sister with more genetic interactions also tends to have more protein–protein interactions and also tends to evolve at a slower rate (Figure 4B).
Genetic interaction degree and profiles can be used to understand the functional divergence of particular duplicates pairs. As a case example, we consider the whole-genome-duplication pair CIK1–VIK1. Each of these genes encode proteins that form distinct heterodimeric complexes with the microtubule motor protein Kar3 (Manning et al, 1999). Although each of these proteins depend on a direct physical interaction with Kar3, Cik1 has a much higher profile similarity to Kar3 than does Vik1 (r=0.5 and r=0.3, respectively). Consistent with its higher similarity, Δcik1 and Δkar3 exhibit several similar phenotypes, including abnormally short spindles, chromosome loss and delayed cell cycle progression (Page et al, 1994; Manning et al, 1999). In contrast, a Δvik1 mutant strain exhibits no overt phenotype (Manning et al, 1999).
The characterization of functional redundancy and divergence between duplicate genes is an important step in understanding the evolution of genetic systems. Large-scale genetic network analysis in Saccharomyces cerevisiae provides a powerful perspective for addressing these questions through quantitative measurements of genetic interactions between pairs of duplicated genes, and more generally, through the study of genome-wide genetic interaction profiles associated with duplicated genes. We show that duplicate genes exhibit fewer genetic interactions than other genes because they tend to buffer one another functionally, whereas observed interactions are non-overlapping and reflect their divergent roles. We also show that duplicate gene pairs are highly imbalanced in their number of genetic interactions with other genes, a pattern that appears to result from asymmetric evolution, such that one duplicate evolves or degrades faster than the other and often becomes functionally or conditionally specialized. The differences in genetic interactions are predictive of differences in several other evolutionary and physiological properties of duplicate pairs.
PMCID: PMC3010121  PMID: 21081923
duplicate genes; functional divergence; genetic interactions; paralogs; Saccharomyces cerevisiae
12.  Construction and use of gene expression covariation matrix 
BMC Bioinformatics  2009;10:214.
One essential step in the massive analysis of transcriptomic profiles is the calculation of the correlation coefficient, a value used to select pairs of genes with similar or inverse transcriptional profiles across a large fraction of the biological conditions examined. Until now, the choice between the two available methods for calculating the coefficient has been dictated mainly by technological considerations. Specifically, in analyses based on double-channel techniques, researchers have been required to use covariation correlation, i.e. the correlation between gene expression changes measured between several pairs of biological conditions, expressed for example as fold-change. In contrast, in analyses of single-channel techniques scientists have been restricted to the use of coexpression correlation, i.e. correlation between gene expression levels. To our knowledge, nobody has ever examined the possible benefits of using covariation instead of coexpression in massive analyses of single channel microarray results.
We describe here how single-channel techniques can be treated like double-channel techniques and used to generate both gene expression changes and covariation measures. We also present a new method that allows the calculation of both positive and negative correlation coefficients between genes. First, we perform systematic comparisons between two given biological conditions and classify, for each comparison, genes as increased (I), decreased (D), or not changed (N). As a result, the original series of n gene expression level measures assigned to each gene is replaced by an ordered string of n(n-1)/2 symbols, e.g. IDDNNIDID....DNNNNNNID, with the length of the string corresponding to the number of comparisons. In a second step, positive and negative covariation matrices (CVM) are constructed by calculating statistically significant positive or negative correlation scores for any pair of genes by comparing their strings of symbols.
This new method, applied to four different large data sets, has allowed us to construct distinct covariation matrices with similar properties. We have also developed a technique to translate these covariation networks into graphical 3D representations and found that the local assignation of the probe sets was conserved across the four chip set models used which encompass three different species (humans, mice, and rats). The application of adapted clustering methods succeeded in delineating six conserved functional regions that we characterized using Gene Ontology information.
PMCID: PMC2720390  PMID: 19594909
13.  Meta-analysis of gene coexpression networks in the post-mortem prefrontal cortex of patients with schizophrenia and unaffected controls 
BMC Neuroscience  2013;14:105.
Gene expression profiling of the postmortem human brain is part of the effort to understand the neuropathological underpinnings of schizophrenia. Existing microarray studies have identified a large number of genes as candidates, but efforts to generate an integrated view of molecular and cellular changes underlying the illness are few. Here, we have applied a novel approach to combining coexpression data across seven postmortem human brain studies of schizophrenia.
We generated separate coexpression networks for the control and schizophrenia prefrontal cortex and found that differences in global network properties were small. We analyzed gene coexpression relationships of previously identified differentially expressed ‘schizophrenia genes’. Evaluation of network properties revealed differences for the up- and down-regulated ‘schizophrenia genes’, with clustering coefficient displaying particularly interesting trends. We identified modules of coexpressed genes in each network and characterized them according to disease association and cell type specificity. Functional enrichment analysis of modules in each network revealed that genes with altered expression in schizophrenia associate with modules representing biological processes such as oxidative phosphorylation, myelination, synaptic transmission and immune function. Although a immune-function enriched module was found in both networks, many of the genes in the modules were different. Specifically, a decrease in clustering of immune activation genes in the schizophrenia network was coupled with the loss of various astrocyte marker genes and the schizophrenia candidate genes.
Our novel network-based approach for evaluating gene coexpression provides results that converge with existing evidence from genetic and genomic studies to support an immunological link to the pathophysiology of schizophrenia.
PMCID: PMC3849476  PMID: 24070017
Schizophrenia; Microarray; Gene coexpression network; Postmortem brain
14.  Meta-analysis of Inter-species Liver Co-expression Networks Elucidates Traits Associated with Common Human Diseases 
PLoS Computational Biology  2009;5(12):e1000616.
Co-expression networks are routinely used to study human diseases like obesity and diabetes. Systematic comparison of these networks between species has the potential to elucidate common mechanisms that are conserved between human and rodent species, as well as those that are species-specific characterizing evolutionary plasticity. We developed a semi-parametric meta-analysis approach for combining gene-gene co-expression relationships across expression profile datasets from multiple species. The simulation results showed that the semi-parametric method is robust against noise. When applied to human, mouse, and rat liver co-expression networks, our method out-performed existing methods in identifying gene pairs with coherent biological functions. We identified a network conserved across species that highlighted cell-cell signaling, cell-adhesion and sterol biosynthesis as main biological processes represented in genome-wide association study candidate gene sets for blood lipid levels. We further developed a heterogeneity statistic to test for network differences among multiple datasets, and demonstrated that genes with species-specific interactions tend to be under positive selection throughout evolution. Finally, we identified a human-specific sub-network regulated by RXRG, which has been validated to play a different role in hyperlipidemia and Type 2 diabetes between human and mouse. Taken together, our approach represents a novel step forward in integrating gene co-expression networks from multiple large scale datasets to leverage not only common information but also differences that are dataset-specific.
Author Summary
Two important aspects of drug development are drug target identification and biomarker discovery for early disease detection, disease progression, drug efficacy and drug toxicity, etc. Recently, many single nucleotide polymorphisms (SNPs) associated with human diseases are discovered through large genome-wide association studies (GWAS). However, it is still largely unclear how these candidate SNPs may cause human diseases. The ultimate aim of this paper is to put these GWAS candidate SNPs and their associated genes into a network context to understand their mechanism of action in human diseases. In addition to large-scale human data sets that are often heterogeneous in terms of genetic and environmental factors, many high quality data sets in rodents exist and are frequently used to model human diseases. To leverage such information, we developed a method for combining and contrasting gene networks between human and rodents, specifically to elucidate how GWAS candidate SNPs may contribute to human diseases. By identifying mechanisms that are conserved or divergent between human and rodents, we can also predict which disease causal genes can be studied using rodent models and which ones may not.
PMCID: PMC2787626  PMID: 20019805
15.  Transcriptional dynamics of a conserved gene expression network associated with craniofacial divergence in Arctic charr 
EvoDevo  2014;5:40.
Understanding the molecular basis of craniofacial variation can provide insights into key developmental mechanisms of adaptive changes and their role in trophic divergence and speciation. Arctic charr (Salvelinus alpinus) is a polymorphic fish species, and, in Lake Thingvallavatn in Iceland, four sympatric morphs have evolved distinct craniofacial structures. We conducted a gene expression study on candidates from a conserved gene coexpression network, focusing on the development of craniofacial elements in embryos of two contrasting Arctic charr morphotypes (benthic and limnetic).
Four Arctic charr morphs were studied: one limnetic and two benthic morphs from Lake Thingvallavatn and a limnetic reference aquaculture morph. The presence of morphological differences at developmental stages before the onset of feeding was verified by morphometric analysis. Following up on our previous findings that Mmp2 and Sparc were differentially expressed between morphotypes, we identified a network of genes with conserved coexpression across diverse vertebrate species. A comparative expression study of candidates from this network in developing heads of the four Arctic charr morphs verified the coexpression relationship of these genes and revealed distinct transcriptional dynamics strongly correlated with contrasting craniofacial morphologies (benthic versus limnetic). A literature review and Gene Ontology analysis indicated that a significant proportion of the network genes play a role in extracellular matrix organization and skeletogenesis, and motif enrichment analysis of conserved noncoding regions of network candidates predicted a handful of transcription factors, including Ap1 and Ets2, as potential regulators of the gene network. The expression of Ets2 itself was also found to associate with network gene expression. Genes linked to glucocorticoid signalling were also studied, as both Mmp2 and Sparc are responsive to this pathway. Among those, several transcriptional targets and upstream regulators showed differential expression between the contrasting morphotypes. Interestingly, although selected network genes showed overlapping expression patterns in situ and no morph differences, Timp2 expression patterns differed between morphs.
Our comparative study of transcriptional dynamics in divergent craniofacial morphologies of Arctic charr revealed a conserved network of coexpressed genes sharing functional roles in structural morphogenesis. We also implicate transcriptional regulators of the network as targets for future functional studies.
Electronic supplementary material
The online version of this article (doi:10.1186/2041-9139-5-40) contains supplementary material, which is available to authorized users.
PMCID: PMC4240837  PMID: 25419450
Arctic charr; Coexpression; Craniofacial development; Divergent evolution; Gene network; Morphogenesis; Salvelinus alpinus
16.  Is My Network Module Preserved and Reproducible? 
PLoS Computational Biology  2011;7(1):e1001057.
In many applications, one is interested in determining which of the properties of a network module change across conditions. For example, to validate the existence of a module, it is desirable to show that it is reproducible (or preserved) in an independent test network. Here we study several types of network preservation statistics that do not require a module assignment in the test network. We distinguish network preservation statistics by the type of the underlying network. Some preservation statistics are defined for a general network (defined by an adjacency matrix) while others are only defined for a correlation network (constructed on the basis of pairwise correlations between numeric variables). Our applications show that the correlation structure facilitates the definition of particularly powerful module preservation statistics. We illustrate that evaluating module preservation is in general different from evaluating cluster preservation. We find that it is advantageous to aggregate multiple preservation statistics into summary preservation statistics. We illustrate the use of these methods in six gene co-expression network applications including 1) preservation of cholesterol biosynthesis pathway in mouse tissues, 2) comparison of human and chimpanzee brain networks, 3) preservation of selected KEGG pathways between human and chimpanzee brain networks, 4) sex differences in human cortical networks, 5) sex differences in mouse liver networks. While we find no evidence for sex specific modules in human cortical networks, we find that several human cortical modules are less preserved in chimpanzees. In particular, apoptosis genes are differentially co-expressed between humans and chimpanzees. Our simulation studies and applications show that module preservation statistics are useful for studying differences between the modular structure of networks. Data, R software and accompanying tutorials can be downloaded from the following webpage:
Author Summary
In network applications, one is often interested in studying whether modules are preserved across multiple networks. For example, to determine whether a pathway of genes is perturbed in a certain condition, one can study whether its connectivity pattern is no longer preserved. Non-preserved modules can either be biologically uninteresting (e.g., reflecting data outliers) or interesting (e.g., reflecting sex specific modules). An intuitive approach for studying module preservation is to cross-tabulate module membership. But this approach often cannot address questions about the preservation of connectivity patterns between nodes. Thus, cross-tabulation based approaches often fail to recognize that important aspects of a network module are preserved. Cross-tabulation methods make it difficult to argue that a module is not preserved. The weak statement (“the reference module does not overlap with any of the identified test set modules”) is less relevant in practice than the strong statement (“the module cannot be found in the test network irrespective of the parameter settings of the module detection procedure”). Module preservation statistics have important applications, e.g. we show that the wiring of apoptosis genes in a human cortical network differs from that in chimpanzees.
PMCID: PMC3024255  PMID: 21283776
17.  Systematic survey reveals general applicability of "guilt-by-association" within gene coexpression networks 
BMC Bioinformatics  2005;6:227.
Biological processes are carried out by coordinated modules of interacting molecules. As clustering methods demonstrate that genes with similar expression display increased likelihood of being associated with a common functional module, networks of coexpressed genes provide one framework for assigning gene function. This has informed the guilt-by-association (GBA) heuristic, widely invoked in functional genomics. Yet although the idea of GBA is accepted, the breadth of GBA applicability is uncertain.
We developed methods to systematically explore the breadth of GBA across a large and varied corpus of expression data to answer the following question: To what extent is the GBA heuristic broadly applicable to the transcriptome and conversely how broadly is GBA captured by a priori knowledge represented in the Gene Ontology (GO)? Our study provides an investigation of the functional organization of five coexpression networks using data from three mammalian organisms. Our method calculates a probabilistic score between each gene and each Gene Ontology category that reflects coexpression enrichment of a GO module. For each GO category we use Receiver Operating Curves to assess whether these probabilistic scores reflect GBA. This methodology applied to five different coexpression networks demonstrates that the signature of guilt-by-association is ubiquitous and reproducible and that the GBA heuristic is broadly applicable across the population of nine hundred Gene Ontology categories. We also demonstrate the existence of highly reproducible patterns of coexpression between some pairs of GO categories.
We conclude that GBA has universal value and that transcriptional control may be more modular than previously realized. Our analyses also suggest that methodologies combining coexpression measurements across multiple genes in a biologically-defined module can aid in characterizing gene function or in characterizing whether pairs of functions operate together.
PMCID: PMC1239911  PMID: 16162296
18.  A methodology for the analysis of differential coexpression across the human lifespan 
BMC Bioinformatics  2009;10:306.
Differential coexpression is a change in coexpression between genes that may reflect 'rewiring' of transcriptional networks. It has previously been hypothesized that such changes might be occurring over time in the lifespan of an organism. While both coexpression and differential expression of genes have been previously studied in life stage change or aging, differential coexpression has not. Generalizing differential coexpression analysis to many time points presents a methodological challenge. Here we introduce a method for analyzing changes in coexpression across multiple ordered groups (e.g., over time) and extensively test its validity and usefulness.
Our method is based on the use of the Haar basis set to efficiently represent changes in coexpression at multiple time scales, and thus represents a principled and generalizable extension of the idea of differential coexpression to life stage data. We used published microarray studies categorized by age to test the methodology. We validated the methodology by testing our ability to reconstruct Gene Ontology (GO) categories using our measure of differential coexpression and compared this result to using coexpression alone. Our method allows significant improvement in characterizing these groups of genes. Further, we examine the statistical properties of our measure of differential coexpression and establish that the results are significant both statistically and by an improvement in semantic similarity. In addition, we found that our method finds more significant changes in gene relationships compared to several other methods of expressing temporal relationships between genes, such as coexpression over time.
Differential coexpression over age generates significant and biologically relevant information about the genes producing it. Our Haar basis methodology for determining age-related differential coexpression performs better than other tested methods. The Haar basis set also lends itself to ready interpretation in terms of both evolutionary and physiological mechanisms of aging and can be seen as a natural generalization of two-category differential coexpression.
PMCID: PMC2761903  PMID: 19772654
19.  COXPRESdb: a database of comparative gene coexpression networks of eleven species for mammals 
Nucleic Acids Research  2012;41(Database issue):D1014-D1020.
Coexpressed gene databases are valuable resources for identifying new gene functions or functional modules in metabolic pathways and signaling pathways. Although coexpressed gene databases are a fundamental platform in the field of plant biology, their use in animal studies is relatively limited. The COXPRESdb ( provides coexpression relationships for multiple animal species, as comparisons of coexpressed gene lists can enhance the reliability of gene coexpression determinations. Here, we report the updates of the database, mainly focusing on the following two points. First, we updated our coexpression data by including recent microarray data for the previous seven species (human, mouse, rat, chicken, fly, zebrafish and nematode) and adding four new species (monkey, dog, budding yeast and fission yeast), along with a new human microarray platform. A reliability scoring function was also implemented, based on coexpression conservation to filter out coexpression with low reliability. Second, the network drawing function was updated, to implement automatic cluster analyses with enrichment analyses in Gene Ontology and in cis elements, along with interactive network analyses with Cytoscape Web. With these updates, COXPRESdb will become a more powerful tool for analyses of functional and regulatory networks of genes in a variety of animal species.
PMCID: PMC3531062  PMID: 23203868
20.  Empirical Multiscale Networks of Cellular Regulation 
PLoS Computational Biology  2007;3(10):e207.
Grouping genes by similarity of expression across multiple cellular conditions enables the identification of cellular modules. The known functions of genes enable the characterization of the aggregate biological functions of these modules. In this paper, we use a high-throughput approach to identify the effective mutual regulatory interactions between modules composed of mouse genes from the Alliance for Cell Signaling (AfCS) murine B-lymphocyte database which tracks the response of ∼15,000 genes following chemokine perturbation. This analysis reveals principles of cellular organization that we discuss along four conceptual axes. (1) Regulatory implications: the derived collection of influences between any two modules quantifies intuitive as well as unexpected regulatory interactions. (2) Behavior across scales: trends across global networks of varying resolution (composed of various numbers of modules) reveal principles of assembly of high-level behaviors from smaller components. (3) Temporal behavior: tracking the mutual module influences over different time intervals provides features of regulation dynamics such as duration, persistence, and periodicity. (4) Gene Ontology correspondence: the association of modules to known biological roles of individual genes describes the organization of functions within coexpressed modules of various sizes. We present key specific results in each of these four areas, as well as derive general principles of cellular organization. At the coarsest scale, the entire transcriptional network contains five divisions: two divisions devoted to ATP production/biosynthesis and DNA replication that activate all other divisions, an “extracellular interaction” division that represses all other divisions, and two divisions (proliferation/differentiation and membrane infrastructure) that activate and repress other divisions in specific ways consistent with cell cycle control.
Author Summary
In a eukaryotic organism such as the mouse, the complete transcriptional network contains ∼15,000 genes and up to 225 million regulatory relationships between pairs of genes. Determining all of these relationships is currently intractable using traditional experimental techniques, and, thus, a comprehensive description of the entire mouse transcriptional network is elusive. Alternatively, one can apply the limited amount of experimental data to determine the entire transcriptional network at a less detailed, higher level. This is analogous to considering a map of the world resolved to the kilometer rather than to the millimeter. Here, we derive from mouse microarray data several high-scale transcriptional networks by determining the mutual effective regulatory influences of large modules of genes. In particular, global transcriptional networks containing 12 to 72 modules are derived, and analysis of these multiscale networks reveals properties of the transcriptional network that are universal at all scales (e.g., maintenance of homeostasis) and properties that vary as a function of scale (e.g., the fractions of module pairs that exert mutual regulation). In addition, we describe how cellular functions associated with large modules (those containing many genes) are composed of more specific functions associated with smaller modules.
PMCID: PMC2041980  PMID: 17953478
21.  Connectivity in the Yeast Cell Cycle Transcription Network: Inferences from Neural Networks 
PLoS Computational Biology  2006;2(12):e169.
A current challenge is to develop computational approaches to infer gene network regulatory relationships based on multiple types of large-scale functional genomic data. We find that single-layer feed-forward artificial neural network (ANN) models can effectively discover gene network structure by integrating global in vivo protein:DNA interaction data (ChIP/Array) with genome-wide microarray RNA data. We test this on the yeast cell cycle transcription network, which is composed of several hundred genes with phase-specific RNA outputs. These ANNs were robust to noise in data and to a variety of perturbations. They reliably identified and ranked 10 of 12 known major cell cycle factors at the top of a set of 204, based on a sum-of-squared weights metric. Comparative analysis of motif occurrences among multiple yeast species independently confirmed relationships inferred from ANN weights analysis. ANN models can capitalize on properties of biological gene networks that other kinds of models do not. ANNs naturally take advantage of patterns of absence, as well as presence, of factor binding associated with specific expression output; they are easily subjected to in silico “mutation” to uncover biological redundancies; and they can use the full range of factor binding values. A prominent feature of cell cycle ANNs suggested an analogous property might exist in the biological network. This postulated that “network-local discrimination” occurs when regulatory connections (here between MBF and target genes) are explicitly disfavored in one network module (G2), relative to others and to the class of genes outside the mitotic network. If correct, this predicts that MBF motifs will be significantly depleted from the discriminated class and that the discrimination will persist through evolution. Analysis of distantly related Schizosaccharomyces pombe confirmed this, suggesting that network-local discrimination is real and complements well-known enrichment of MBF sites in G1 class genes.
A current challenge is to develop computational approaches to infer gene network regulatory relationships by integrating multiple types of large-scale functional genomic data. This paper shows that simple artificial neural networks (ANNs) employed in a new way do this very well. The ANN models are well-suited to capitalize on natural properties of gene networks in ways that many previous methods do not. Resulting gene network connections inferred between transcription factors and RNA output patterns are robust to noise in large-scale input datasets and to differences in RNA clustering class inputs. This was shown by using the yeast cell cycle gene network as a test case. The cycle has multiple classes of oscillatory RNAs, and Hart, Mjolsness, and Wold show that the ANNs identify key connections that associate genes from each cell cycle phase group with known and candidate regulators. Comparative analysis of network connectivity across multiple genomes showed strong conservation of basic factor-to-output relationships, although at the greatest evolutionary distances the specific target genes have mainly changed identity.
PMCID: PMC1761652  PMID: 17194216
22.  COXPRESdb: a database to compare gene coexpression in seven model animals 
Nucleic Acids Research  2010;39(Database issue):D1016-D1022.
Publicly available databases of coexpressed gene sets are a valuable resource for a wide variety of experimental studies, including gene targeting for functional identification, and for investigations of regulatory mechanisms or protein–protein interaction networks. Although coexpressed gene databases are becoming more and more popular in the field of plant biology, those with animal data are rather limited, possibly due to the lower reliability of the coexpression data. The original COXPRESdb (coexpressed gene database) ( represented the coexpression relationship for human and mouse. Here, we report updates of this database that especially focus on the enhancement of the reliability of gene coexpression data in animals. For this purpose, we implemented a new comparable coexpression measure, Mutual Rank, included five other animal species, rat, chicken, zebrafish, fly and nematoda, to assess the conservation of coexpression, and added different layers of omics data into the integrated network of genes. Comparison of coexpression is a key concept to enhance the reliability of gene coexpression, and the integration of different information can reduce the noise inherent in the information. With the functions for gene network representation, COXPRESdb can help researchers to clarify the functional and regulatory networks of genes in a broad array of animal species.
PMCID: PMC3013720  PMID: 21081562
23.  Of Mice and Men: Divergence of Gene Expression Patterns in Kidney 
PLoS ONE  2012;7(10):e46876.
Since the development of methods for homologous gene recombination, mouse models have played a central role in research in renal pathophysiology. However, many published and unpublished results show that mice with genetic changes mimicking human pathogenic mutations do not display the human phenotype. These functional differences may stem from differences in gene expression between mouse and human kidneys. However, large scale comparison of gene expression networks revealed conservation of gene expression among a large panel of human and mouse tissues including kidneys. Because renal functions result from the spatial integration of elementary processes originating in the glomerulus and the successive segments constituting the nephron, we hypothesized that differences in gene expression profiles along the human and mouse nephron might account for different behaviors. Analysis of SAGE libraries generated from the glomerulus and seven anatomically defined nephron segments from human and mouse kidneys allowed us to identify 4644 pairs of gene orthologs expressed in either one or both species. Quantitative analysis shows that many transcripts are present at different levels in the two species. It also shows poor conservation of gene expression profiles, with less than 10% of the 4644 gene orthologs displaying a higher conservation of expression profiles than the neutral expectation (p<0.05). Accordingly, hierarchical clustering reveals a higher degree of conservation of gene expression patterns between functionally unrelated kidney structures within a given species than between cognate structures from the two species. Similar findings were obtained for sub-groups of genes with either kidney-specific or housekeeping functions. Conservation of gene expression at the scale of the whole organ and divergence at the level of its constituting sub-structures likely account for the fact that although kidneys assume the same global function in the two species, many mouse “models” of human pathologies do not display the expected phenotype.
PMCID: PMC3463552  PMID: 23056504
24.  Expression dynamics of a cellular metabolic network 
Molecular Systems Biology  2005;1:2005.0016.
Toward the goal of understanding system properties of biological networks, we investigate the global and local regulation of gene expression in the Saccharomyces cerevisiae metabolic network. Our results demonstrate predominance of local gene regulation in metabolism. Metabolic genes display significant coexpression on distances smaller than the average network distance, a behavior supported by the distribution of transcription factor binding sites in the metabolic network and genome context associations. Positive gene coexpression decreases monotonically with distance in the network, while negative coexpression is strongest at intermediate network distances. We show that basic topological motifs of the metabolic network exhibit statistically significant differences in coexpression behavior.
PMCID: PMC1681454  PMID: 16729051
expression; genome context; metabolism; motifs; network
25.  Altered Chromatin Occupancy of Master Regulators Underlies Evolutionary Divergence in the Transcriptional Landscape of Erythroid Differentiation 
PLoS Genetics  2014;10(12):e1004890.
Erythropoiesis is one of the best understood examples of cellular differentiation. Morphologically, erythroid differentiation proceeds in a nearly identical fashion between humans and mice, but recent evidence has shown that networks of gene expression governing this process are divergent between species. We undertook a systematic comparative analysis of six histone modifications and four transcriptional master regulators in primary proerythroblasts and erythroid cell lines to better understand the underlying basis of these transcriptional differences. Our analyses suggest that while chromatin structure across orthologous promoters is strongly conserved, subtle differences are associated with transcriptional divergence between species. Many transcription factor (TF) occupancy sites were poorly conserved across species (∼25% for GATA1, TAL1, and NFE2) but were more conserved between proerythroblasts and cell lines derived from the same species. We found that certain cis-regulatory modules co-occupied by GATA1, TAL1, and KLF1 are under strict evolutionary constraint and localize to genes necessary for erythroid cell identity. More generally, we show that conserved TF occupancy sites are indicative of active regulatory regions and strong gene expression that is sustained during maturation. Our results suggest that evolutionary turnover of TF binding sites associates with changes in the underlying chromatin structure, driving transcriptional divergence. We provide examples of how this framework can be applied to understand epigenomic variation in specific regulatory regions, such as the β-globin gene locus. Our findings have important implications for understanding epigenomic changes that mediate variation in cellular differentiation across species, while also providing a valuable resource for studies of hematopoiesis.
Author Summary
The process whereby blood progenitor cells differentiate into red blood cells, known as erythropoiesis, is very similar between mice and humans. Yet, while studies of this process in mouse have substantially improved our knowledge of human erythropoiesis, recent work has shown a significant divergence in global gene expression across species, suggesting that extrapolation from mouse models to human is not always straightforward. In order to better understand these differences, we have performed a comparative epigenomic analysis of six histone modifications and four master transcription factors. By globally comparing chromatin structure across primary cells and model cell lines in both species, we discovered that while chromatin structure is well conserved at orthologous promoters, subtle changes are predictive of species-specific gene expression. Furthermore, we discovered that the genomic localizations of master transcription factors are poorly conserved, and species-specific losses or gains are associated with changes to the underlying chromatin structure and concomitant gene expression. By using our comparative epigenomics framework, we identified a putative human-specific cis-regulatory module that drives expression of human, but not mouse, GDF15, a gene implicated in iron homeostasis. Our results provide a resource to aid researchers in interpreting genetic and epigenetic differences between species.
PMCID: PMC4270484  PMID: 25521328

Results 1-25 (1330236)