PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (668224)

Clipboard (0)
None

Related Articles

1.  Natural selection governs local, but not global, evolutionary gene coexpression networks in Caenorhabditis elegans 
BMC Systems Biology  2008;2:96.
Background
Large-scale evaluation of gene expression variation among Caenorhabditis elegans lines that have diverged from a common ancestor allows for the analysis of a novel class of biological networks – evolutionary gene coexpression networks. Comparative analysis of these evolutionary networks has the potential to uncover the effects of natural selection in shaping coexpression network topologies since C. elegans mutation accumulation (MA) lines evolve essentially free from the effects of natural selection, whereas natural isolate (NI) populations are subject to selective constraints.
Results
We compared evolutionary gene coexpression networks for C. elegans MA lines versus NI populations to evaluate the role that natural selection plays in shaping the evolution of network topologies. MA and NI evolutionary gene coexpression networks were found to have very similar global topological properties as measured by a number of network topological parameters. Observed MA and NI networks show node degree distributions and average values for node degree, clustering coefficient, path length, eccentricity and betweeness that are statistically indistinguishable from one another yet highly distinct from randomly simulated networks. On the other hand, at the local level the MA and NI coexpression networks are highly divergent; pairs of genes coexpressed in the MA versus NI lines are almost entirely different as are the connectivity and clustering properties of individual genes.
Conclusion
It appears that selective forces shape how local patterns of coexpression change over time but do not control the global topology of C. elegans evolutionary gene coexpression networks. These results have implications for the evolutionary significance of global network topologies, which are known to be conserved across disparate complex systems.
doi:10.1186/1752-0509-2-96
PMCID: PMC2596099  PMID: 19014554
2.  Rapid and asymmetric divergence of duplicate genes in the human gene coexpression network 
BMC Bioinformatics  2006;7:46.
Background
While gene duplication is known to be one of the most common mechanisms of genome evolution, the fates of genes after duplication are still being debated. In particular, it is presently unknown whether most duplicate genes preserve (or subdivide) the functions of the parental gene or acquire new functions. One aspect of gene function, that is the expression profile in gene coexpression network, has been largely unexplored for duplicate genes.
Results
Here we build a human gene coexpression network using human tissue-specific microarray data and investigate the divergence of duplicate genes in it. The topology of this network is scale-free. Interestingly, our analysis indicates that duplicate genes rapidly lose shared coexpressed partners: after approximately 50 million years since duplication, the two duplicate genes in a pair have only slightly higher number of shared partners as compared with two random singletons. We also show that duplicate gene pairs quickly acquire new coexpressed partners: the average number of partners for a duplicate gene pair is significantly greater than that for a singleton (the latter number can be used as a proxy of the number of partners for a parental singleton gene before duplication). The divergence in gene expression between two duplicates in a pair occurs asymmetrically: one gene usually has more partners than the other one. The network is resilient to both random and degree-based in silico removal of either singletons or duplicate genes. In contrast, the network is especially vulnerable to the removal of highly connected genes when duplicate genes and singletons are considered together.
Conclusion
Duplicate genes rapidly diverge in their expression profiles in the network and play similar role in maintaining the network robustness as compared with singletons.
Contact: kdm16@psu.edu
Supplementary information: Please see additional files.
doi:10.1186/1471-2105-7-46
PMCID: PMC1403810  PMID: 16441884
3.  Identification of conserved drought stress responsive gene-network across tissues and developmental stages in rice 
Bioinformation  2013;9(2):72-78.
Identification of genes that are coexpressed across various tissues and environmental stresses is biologically interesting, since they may play coordinated role in similar biological processes. Genes with correlated expression patterns can be best identified by using coexpression network analysis of transcriptome data. In the present study, we analyzed the temporal-spatial coordination of gene expression in root, leaf and panicle of rice under drought stress and constructed network using WGCNA and Cytoscape. Total of 2199 differentially expressed genes (DEGs) were identified in at least three or more tissues, wherein 88 genes have coordinated expression profile among all the six tissues under drought stress. These 88 highly coordinated genes were further subjected to module identification in the coexpression network. Based on chief topological properties we identified 18 hub genes such as ABC transporter, ATP-binding protein, dehydrin, protein phosphatase 2C, LTPL153 - Protease inhibitor, phosphatidylethanolaminebinding protein, lactose permease-related, NADP-dependent malic enzyme, etc. Motif enrichment analysis showed the presence of ABRE cis-elements in the promoters of > 62% of the coordinately expressed genes. Our results suggest that drought stress mediated upregulated gene expression was coordinated through an ABA-dependent signaling pathway across tissues, at least for the subset of genes identified in this study, while down regulation appears to be regulated by tissue specific pathways in rice.
doi:10.6026/97320630009072
PMCID: PMC3563401  PMID: 23390349
Coexpression; Drought stress; Hub gene; Rice; Transcriptome; WGCNA
4.  Expression dynamics of a cellular metabolic network 
Molecular Systems Biology  2005;1:2005.0016.
Toward the goal of understanding system properties of biological networks, we investigate the global and local regulation of gene expression in the Saccharomyces cerevisiae metabolic network. Our results demonstrate predominance of local gene regulation in metabolism. Metabolic genes display significant coexpression on distances smaller than the average network distance, a behavior supported by the distribution of transcription factor binding sites in the metabolic network and genome context associations. Positive gene coexpression decreases monotonically with distance in the network, while negative coexpression is strongest at intermediate network distances. We show that basic topological motifs of the metabolic network exhibit statistically significant differences in coexpression behavior.
doi:10.1038/msb4100023
PMCID: PMC1681454  PMID: 16729051
expression; genome context; metabolism; motifs; network
5.  COXPRESdb: a database to compare gene coexpression in seven model animals 
Nucleic Acids Research  2010;39(Database issue):D1016-D1022.
Publicly available databases of coexpressed gene sets are a valuable resource for a wide variety of experimental studies, including gene targeting for functional identification, and for investigations of regulatory mechanisms or protein–protein interaction networks. Although coexpressed gene databases are becoming more and more popular in the field of plant biology, those with animal data are rather limited, possibly due to the lower reliability of the coexpression data. The original COXPRESdb (coexpressed gene database) (http://coxpresdb.jp) represented the coexpression relationship for human and mouse. Here, we report updates of this database that especially focus on the enhancement of the reliability of gene coexpression data in animals. For this purpose, we implemented a new comparable coexpression measure, Mutual Rank, included five other animal species, rat, chicken, zebrafish, fly and nematoda, to assess the conservation of coexpression, and added different layers of omics data into the integrated network of genes. Comparison of coexpression is a key concept to enhance the reliability of gene coexpression, and the integration of different information can reduce the noise inherent in the information. With the functions for gene network representation, COXPRESdb can help researchers to clarify the functional and regulatory networks of genes in a broad array of animal species.
doi:10.1093/nar/gkq1147
PMCID: PMC3013720  PMID: 21081562
6.  Hierarchical Organization of Human Cortical Networks in Health and Schizophrenia 
The complex organization of connectivity in the human brain is incompletely understood. Recently, topological measures based on graph theory have provided a new approach to quantify large-scale cortical networks. These methods have been applied to anatomical connectivity data on non-human species and cortical networks have been shown to have small-world topology, associated with high local and global efficiency of information transfer. Anatomical networks derived from cortical thickness measurements have shown the same organizational properties of the healthy human brain, consistent with similar results reported in functional networks derived from resting state functional MRI and MEG data. Here we show, using anatomical networks derived from analysis of inter-regional covariation of gray matter volume in magnetic resonance imaging (MRI) data on 259 healthy volunteers, that classical divisions of cortex (multimodal, unimodal and transmodal) have some distinct topological attributes. While all cortical divisions shared non-random properties of small-worldness and efficient wiring (short mean Euclidean distance between connected regions), the multimodal network had a hierarchical organization, dominated by frontal hubs with low clustering, whereas the transmodal network was assortative. Moreover, in a sample of 203 people with schizophrenia, multimodal network organization was abnormal, as indicated by reduced hierarchy, the loss of frontal and the emergence of non-frontal hubs, and increased connection distance. We propose that the topological differences between divisions of normal cortex may represent the outcome of different growth processes for multimodal and transmodal networks; and that neurodevelopmental abnormalities in schizophrenia specifically impact multimodal cortical organization.
doi:10.1523/JNEUROSCI.1929-08.2008
PMCID: PMC2878961  PMID: 18784304
anatomy; network; hierarchy; systems; MRI; schizophrenia; neurodevelopment
7.  Gene Coexpression Network Topology of Cardiac Development, Hypertrophy, and Failure 
Background
Network analysis techniques allow a more accurate reflection of underlying systems biology to be realized than traditional unidimensional molecular biology approaches. Here, using gene coexpression network analysis, we define the gene expression network topology of cardiac hypertrophy and failure and the extent of recapitulation of fetal gene expression programs in failing and hypertrophied adult myocardium.
Methods and Results
We assembled all myocardial transcript data in the Gene Expression Omnibus (n = 1617). Since hierarchical analysis revealed species had primacy over disease clustering, we focused this analysis on the most complete (murine) dataset (n = 478). Using gene coexpression network analysis, we derived functional modules, regulatory mediators and higher order topological relationships between genes and identified 50 gene co-expression modules in developing myocardium that were not present in normal adult tissue. We found that known gene expression markers of myocardial adaptation were members of upregulated modules but not hub genes. We identified ZIC2 as a novel transcription factor associated with coexpression modules common to developing and failing myocardium. Of 50 fetal gene co-expression modules, three (6%) were reproduced in hypertrophied myocardium and seven (14%) were reproduced in failing myocardium. One fetal module was common to both failing and hypertrophied myocardium.
Conclusions
Network modeling allows systems analysis of cardiovascular development and disease. While we did not find evidence for a global coordinated program of fetal gene expression in adult myocardial adaptation, our analysis revealed specific gene expression modules active during both development and disease and specific candidates for their regulation.
doi:10.1161/CIRCGENETICS.110.941757
PMCID: PMC3324316  PMID: 21127201
fetal; gene expression; heart failure; hypertrophy; myocardium
8.  COXPRESdb: a database of coexpressed gene networks in mammals 
Nucleic Acids Research  2007;36(Database issue):D77-D82.
A database of coexpressed gene sets can provide valuable information for a wide variety of experimental designs, such as targeting of genes for functional identification, gene regulation and/or protein–protein interactions. Coexpressed gene databases derived from publicly available GeneChip data are widely used in Arabidopsis research, but platforms that examine coexpression for higher mammals are rather limited. Therefore, we have constructed a new database, COXPRESdb (coexpressed gene database) (http://coxpresdb.hgc.jp), for coexpressed gene lists and networks in human and mouse. Coexpression data could be calculated for 19 777 and 21 036 genes in human and mouse, respectively, by using the GeneChip data in NCBI GEO. COXPRESdb enables analysis of the four types of coexpression networks: (i) highly coexpressed genes for every gene, (ii) genes with the same GO annotation, (iii) genes expressed in the same tissue and (iv) user-defined gene sets. When the networks became too big for the static picture on the web in GO networks or in tissue networks, we used Google Maps API to visualize them interactively. COXPRESdb also provides a view to compare the human and mouse coexpression patterns to estimate the conservation between the two species.
doi:10.1093/nar/gkm840
PMCID: PMC2238883  PMID: 17932064
9.  A comprehensive functional analysis of tissue specificity of human gene expression 
BMC Biology  2008;6:49.
Background
In recent years, the maturation of microarray technology has allowed the genome-wide analysis of gene expression patterns to identify tissue-specific and ubiquitously expressed ('housekeeping') genes. We have performed a functional and topological analysis of housekeeping and tissue-specific networks to identify universally necessary biological processes, and those unique to or characteristic of particular tissues.
Results
We measured whole genome expression in 31 human tissues, identifying 2374 housekeeping genes expressed in all tissues, and genes uniquely expressed in each tissue. Comprehensive functional analysis showed that the housekeeping set is substantially larger than previously thought, and is enriched with vital processes such as oxidative phosphorylation, ubiquitin-dependent proteolysis, translation and energy metabolism. Network topology of the housekeeping network was characterized by higher connectivity and shorter paths between the proteins than the global network. Ontology enrichment scoring and network topology of tissue-specific genes were consistent with each tissue's function and expression patterns clustered together in accordance with tissue origin. Tissue-specific genes were twice as likely as housekeeping genes to be drug targets, allowing the identification of tissue 'signature networks' that will facilitate the discovery of new therapeutic targets and biomarkers of tissue-targeted diseases.
Conclusion
A comprehensive functional analysis of housekeeping and tissue-specific genes showed that the biological function of housekeeping and tissue-specific genes was consistent with tissue origin. Network analysis revealed that tissue-specific networks have distinct network properties related to each tissue's function. Tissue 'signature networks' promise to be a rich source of targets and biomarkers for disease treatment and diagnosis.
doi:10.1186/1741-7007-6-49
PMCID: PMC2645369  PMID: 19014478
10.  Feature Identification of Compensatory Gene Pairs without Sequence Homology in Yeast 
Genetic robustness refers to a compensatory mechanism for buffering deleterious mutations or environmental variations. Gene duplication has been shown to provide such functional backups. However, the overall contribution of duplication-based buffering for genetic robustness is rather small. In this study, we investigated whether transcriptional compensation also exists among genes that share similar functions without sequence homology. A set of nonhomologous synthetic-lethal gene pairs was assessed by using a coexpression network, protein-protein interactions, and other types of genetic interactions in yeast. Our results are notably different from those of previous studies on buffering paralogs. The low expression similarity and the conditional coexpression alone do not play roles in identifying the functionally compensatory genes. Additional properties such as synthetic-lethal interaction, the ratio of shared common interacting partners, and the degree of coregulation were, at least in part, necessary to extract functional compensatory genes. Our network-based approach is applicable to select several well-documented cases of compensatory gene pairs and a set of new pairs. The results suggest that transcriptional reprogramming plays a limited role in functional compensation among nonhomologous genes. Our study aids in understanding the mechanism and features of functional compensation more in detail.
doi:10.1155/2012/653174
PMCID: PMC3431050  PMID: 22952430
11.  Functional Annotation and Identification of Candidate Disease Genes by Computational Analysis of Normal Tissue Gene Expression Data 
PLoS ONE  2008;3(6):e2439.
Background
High-throughput gene expression data can predict gene function through the “guilt by association” principle: coexpressed genes are likely to be functionally associated.
Methodology/Principal Findings
We analyzed publicly available expression data on normal human tissues. The analysis is based on the integration of data obtained with two experimental platforms (microarrays and SAGE) and of various measures of dissimilarity between expression profiles. The building blocks of the procedure are the Ranked Coexpression Groups (RCG), small sets of tightly coexpressed genes which are analyzed in terms of functional annotation. Functionally characterized RCGs are selected by means of the majority rule and used to predict new functional annotations. Functionally characterized RCGs are enriched in groups of genes associated to similar phenotypes. We exploit this fact to find new candidate disease genes for many OMIM phenotypes of unknown molecular origin.
Conclusions/Significance
We predict new functional annotations for many human genes, showing that the integration of different data sets and coexpression measures significantly improves the scope of the results. Combining gene expression data, functional annotation and known phenotype-gene associations we provide candidate genes for several genetic diseases of unknown molecular basis.
doi:10.1371/journal.pone.0002439
PMCID: PMC2409962  PMID: 18560577
12.  Dynamic Changes in Protein Functional Linkage Networks Revealed by Integration with Gene Expression Data 
PLoS Computational Biology  2008;4(11):e1000237.
Response of cells to changing environmental conditions is governed by the dynamics of intricate biomolecular interactions. It may be reasonable to assume, proteins being the dominant macromolecules that carry out routine cellular functions, that understanding the dynamics of protein∶protein interactions might yield useful insights into the cellular responses. The large-scale protein interaction data sets are, however, unable to capture the changes in the profile of protein∶protein interactions. In order to understand how these interactions change dynamically, we have constructed conditional protein linkages for Escherichia coli by integrating functional linkages and gene expression information. As a case study, we have chosen to analyze UV exposure in wild-type and SOS deficient E. coli at 20 minutes post irradiation. The conditional networks exhibit similar topological properties. Although the global topological properties of the networks are similar, many subtle local changes are observed, which are suggestive of the cellular response to the perturbations. Some such changes correspond to differences in the path lengths among the nodes of carbohydrate metabolism correlating with its loss in efficiency in the UV treated cells. Similarly, expression of hubs under unique conditions reflects the importance of these genes. Various centrality measures applied to the networks indicate increased importance for replication, repair, and other stress proteins for the cells under UV treatment, as anticipated. We thus propose a novel approach for studying an organism at the systems level by integrating genome-wide functional linkages and the gene expression data.
Author Summary
Many cellular processes and the response of cells to environmental cues are determined by the intricate protein∶protein interactions. These cellular protein interactions can be represented in the form of a graph, where the nodes represent the proteins and the edges signify the interactions between them. However, the available protein functional linkage maps do not incorporate the dynamics of gene expression and thus do not portray the dynamics of true protein∶protein interactions in vivo. We have used gene expression data as well as the available protein functional interaction information for Escherichia coli to build the protein interaction networks for expressed genes in a given condition. These networks, named conditional networks, capture the differences in the protein interaction networks and hence the cell physiology. Thus, by exploring the dynamics of protein interaction profiles, we hope to understand the response of cells to environmental changes.
doi:10.1371/journal.pcbi.1000237
PMCID: PMC2580820  PMID: 19043542
13.  Crosstalk between transcription factors and microRNAs in human protein interaction network 
BMC Systems Biology  2012;6:18.
Background
Gene regulatory networks control the global gene expression and the dynamics of protein output in living cells. In multicellular organisms, transcription factors and microRNAs are the major families of gene regulators. Recent studies have suggested that these two kinds of regulators share similar regulatory logics and participate in cooperative activities in the gene regulatory network; however, their combinational regulatory effects and preferences on the protein interaction network remain unclear.
Methods
In this study, we constructed a global human gene regulatory network comprising both transcriptional and post-transcriptional regulatory relationships, and integrated the protein interactome into this network. We then screened the integrated network for four types of regulatory motifs: single-regulation, co-regulation, crosstalk, and independent, and investigated their topological properties in the protein interaction network.
Results
Among the four types of network motifs, the crosstalk was found to have the most enriched protein-protein interactions in their downstream regulatory targets. The topological properties of these motifs also revealed that they target crucial proteins in the protein interaction network and may serve important roles of biological functions.
Conclusions
Altogether, these results reveal the combinatorial regulatory patterns of transcription factors and microRNAs on the protein interactome, and provide further evidence to suggest the connection between gene regulatory network and protein interaction network.
doi:10.1186/1752-0509-6-18
PMCID: PMC3337275  PMID: 22413876
14.  Topological Fractionation of Resting-State Networks 
PLoS ONE  2011;6(10):e26596.
Exploring topological properties of human brain network has become an exciting topic in neuroscience research. Large-scale structural and functional brain networks both exhibit a small-world topology, which is evidence for global and local parallel information processing. Meanwhile, resting state networks (RSNs) underlying specific biological functions have provided insights into how intrinsic functional architecture influences cognitive and perceptual information processing. However, topological properties of single RSNs remain poorly understood. Here, we have two hypotheses: i) each RSN also has optimized small-world architecture; ii) topological properties of RSNs related to perceptual and higher cognitive processes are different. To test these hypotheses, we investigated the topological properties of the default-mode, dorsal attention, central-executive, somato-motor, visual and auditory networks derived from resting-state functional magnetic resonance imaging (fMRI). We found small-world topology in each RSN. Furthermore, small-world properties of cognitive networks were higher than those of perceptual networks. Our findings are the first to demonstrate a topological fractionation between perceptual and higher cognitive networks. Our approach may be useful for clinical research, especially for diseases that show selective abnormal connectivity in specific brain networks.
doi:10.1371/journal.pone.0026596
PMCID: PMC3197522  PMID: 22028917
15.  Link-based quantitative methods to identify differentially coexpressed genes and gene Pairs 
BMC Bioinformatics  2011;12:315.
Background
Differential coexpression analysis (DCEA) is increasingly used for investigating the global transcriptional mechanisms underlying phenotypic changes. Current DCEA methods mostly adopt a gene connectivity-based strategy to estimate differential coexpression, which is characterized by comparing the numbers of gene neighbors in different coexpression networks. Although it simplifies the calculation, this strategy mixes up the identities of different coexpression neighbors of a gene, and fails to differentiate significant differential coexpression changes from those trivial ones. Especially, the correlation-reversal is easily missed although it probably indicates remarkable biological significance.
Results
We developed two link-based quantitative methods, DCp and DCe, to identify differentially coexpressed genes and gene pairs (links). Bearing the uniqueness of exploiting the quantitative coexpression change of each gene pair in the coexpression networks, both methods proved to be superior to currently popular methods in simulation studies. Re-mining of a publicly available type 2 diabetes (T2D) expression dataset from the perspective of differential coexpression analysis led to additional discoveries than those from differential expression analysis.
Conclusions
This work pointed out the critical weakness of current popular DCEA methods, and proposed two link-based DCEA algorithms that will make contribution to the development of DCEA and help extend it to a broader spectrum.
doi:10.1186/1471-2105-12-315
PMCID: PMC3199761  PMID: 21806838
16.  PREDICTION OF INTERACTIONS BETWEEN HIV-1 AND HUMAN PROTEINS BY INFORMATION INTEGRATION 
Human immunodeficiency virus-1 (HIV-1) in acquired immune deficiency syndrome (AIDS) relies on human host cell proteins in virtually every aspect of its life cycle. Knowledge of the set of interacting human and viral proteins would greatly contribute to our understanding of the mechanisms of infection and subsequently to the design of new therapeutic approaches. This work is the first attempt to predict the global set of interactions between HIV-1 and human host cellular proteins. We propose a supervised learning framework, where multiple information data sources are utilized, including co-occurrence of functional motifs and their interaction domains and protein classes, gene ontology annotations, posttranslational modifications, tissue distributions and gene expression profiles, topological properties of the human protein in the interaction network and the similarity of HIV-1 proteins to human proteins’ known binding partners. We trained and tested a Random Forest (RF) classifier with this extensive feature set. The model’s predictions achieved an average Mean Average Precision (MAP) score of 23%. Among the predicted interactions was for example the pair, HIV-1 protein tat and human vitamin D receptor. This interaction had recently been independently validated experimentally. The rank-ordered lists of predicted interacting pairs are a rich source for generating biological hypotheses. Amongst the novel predictions, transcription regulator activity, immune system process and macromolecular complex were the top most significant molecular function, process and cellular compartments, respectively. Supplementary material is available at URL www.cs.cmu.edu/~oznur/hiv/hivPPI.html
PMCID: PMC3263379  PMID: 19209727
17.  Rank of Correlation Coefficient as a Comparable Measure for Biological Significance of Gene Coexpression 
Information regarding gene coexpression is useful to predict gene function. Several databases have been constructed for gene coexpression in model organisms based on a large amount of publicly available gene expression data measured by GeneChip platforms. In these databases, Pearson's correlation coefficients (PCCs) of gene expression patterns are widely used as a measure of gene coexpression. Although the coexpression measure or GeneChip summarization method affects the performance of the gene coexpression database, previous studies for these calculation procedures were tested with only a small number of samples and a particular species. To evaluate the effectiveness of coexpression measures, assessments with large-scale microarray data are required. We first examined characteristics of PCC and found that the optimal PCC threshold to retrieve functionally related genes was affected by the method of gene expression database construction and the target gene function. In addition, we found that this problem could be overcome when we used correlation ranks instead of correlation values. This observation was evaluated by large-scale gene expression data for four species: Arabidopsis, human, mouse and rat.
doi:10.1093/dnares/dsp016
PMCID: PMC2762411  PMID: 19767600
gene coexpression; Pearson's correlation coefficient; GeneChip summarization; Arabidopsis
18.  Evolutionary significance of gene expression divergence 
Gene  2004;345(1):119-126.
Recent large-scale studies of evolutionary changes in gene expression among mammalian species have led to the proposal that gene expression divergence may be neutral with respect to organismic fitness. Here, we employ a comparative analysis of mammalian gene sequence divergence and gene expression divergence to test the hypothesis that the evolution of gene expression is predominantly neutral. Two models of neutral gene expression evolution are considered: 1—purely neutral evolution (i.e., no selective constraint) of gene expression levels and patterns and 2—neutral evolution accompanied by selective constraint. With respect to purely neutral evolution, levels of change in gene expression between human–mouse orthologs are correlated with levels of gene sequence divergence that are determined largely by purifying selection. In contrast, evolutionary changes of tissue-specific gene expression profiles do not show such a correlation with sequence divergence. However, divergence of both gene expression levels and profiles are significantly lower for orthologous human–mouse gene pairs than for pairs of randomly chosen human and mouse genes. These data clearly point to the action of selective constraint on gene expression divergence and are inconsistent with the purely neutral model; however, there is likely to be a neutral component in evolution of gene expression, particularly, in tissues where the expression of a given gene is low and functionally irrelevant. The model of neutral evolution with selective constraint predicts a regular, clock-like accumulation of gene expression divergence. However, relative rate tests of the divergence among human–mouse–rat orthologous gene sets reveal clock-like evolution for gene sequence divergence, and to a lesser extent for gene expression level divergence, but not for the divergence of tissue-specific gene expression profiles. Taken together, these results indicate that gene expression divergence is subject to the effects of purifying selective constraint and suggest that it might also be substantially influenced by positive Darwinian selection.
doi:10.1016/j.gene.2004.11.034
PMCID: PMC1859841  PMID: 15716085
Molecular evolution; Neutral theory; Human; Mouse; Genomics
19.  Discovering Biological Guilds through Topological Abstraction 
High-throughput generation of new types of relational biomedical datasets is creating a demand for methods to provide insights into their complexity. Such networks are often too large to interpret visually and too complicated to be explained solely based on local topological properties.
One way to try to make sense of such complex networks would be to transform them into discernable abstracts, or summaries, of the original networks. Then, important components could become more readily visible. This work presents such an approach for understanding networks via abstraction of global network connectivity using compression. This made possible the discovery of a new type of topological class, referred to herein as a guild, that captures global connectivity similarity. Lastly, the correspondence of these guilds to biological function is validated via an E. Coli gene regulation network. This resulted in biological findings that could not be derived from local topology of the original network.
PMCID: PMC1839326  PMID: 17238291
20.  An atlas of tissue-specific conserved coexpression for functional annotation and disease gene prediction 
European Journal of Human Genetics  2011;19(11):1173-1180.
Gene coexpression relationships that are phylogenetically conserved between human and mouse have been shown to provide important clues about gene function that can be efficiently used to identify promising candidate genes for human hereditary disorders. In the past, such approaches have considered mostly generic gene expression profiles that cover multiple tissues and organs. The individual genes of multicellular organisms, however, can participate in different transcriptional programs, operating at scales as different as single-cell types, tissues, organs, body regions or the entire organism. Therefore, systematic analysis of tissue-specific coexpression could be, in principle, a very powerful strategy to dissect those functional relationships among genes that emerge only in particular tissues or organs. In this report, we show that, in fact, conserved coexpression as determined from tissue-specific and condition-specific data sets can predict many functional relationships that are not detected by analyzing heterogeneous microarray data sets. More importantly, we find that, when combined with disease networks, the simultaneous use of both generic (multi-tissue) and tissue-specific conserved coexpression allows a more efficient prediction of human disease genes than the use of generic conserved coexpression alone. Using this strategy, we were able to identify high-probability candidates for 238 orphan disease loci. We provide proof of concept that this combined use of generic and tissue-specific conserved coexpression can be very useful to prioritize the mutational candidates obtained from deep-sequencing projects, even in the case of genetic disorders as heterogeneous as XLMR.
doi:10.1038/ejhg.2011.96
PMCID: PMC3198151  PMID: 21654723
disease-gene prediction; functional annotation; transcriptome; phenome
21.  A Genomewide Functional Network for the Laboratory Mouse 
PLoS Computational Biology  2008;4(9):e1000165.
Establishing a functional network is invaluable to our understanding of gene function, pathways, and systems-level properties of an organism and can be a powerful resource in directing targeted experiments. In this study, we present a functional network for the laboratory mouse based on a Bayesian integration of diverse genetic and functional genomic data. The resulting network includes probabilistic functional linkages among 20,581 protein-coding genes. We show that this network can accurately predict novel functional assignments and network components and present experimental evidence for predictions related to Nanog homeobox (Nanog), a critical gene in mouse embryonic stem cell pluripotency. An analysis of the global topology of the mouse functional network reveals multiple biologically relevant systems-level features of the mouse proteome. Specifically, we identify the clustering coefficient as a critical characteristic of central modulators that affect diverse pathways as well as genes associated with different phenotype traits and diseases. In addition, a cross-species comparison of functional interactomes on a genomic scale revealed distinct functional characteristics of conserved neighborhoods as compared to subnetworks specific to higher organisms. Thus, our global functional network for the laboratory mouse provides the community with a key resource for discovering protein functions and novel pathway components as well as a tool for exploring systems-level topological and evolutionary features of cellular interactomes. To facilitate exploration of this network by the biomedical research community, we illustrate its application in function and disease gene discovery through an interactive, Web-based, publicly available interface at http://mouseNET.princeton.edu.
Author Summary
Functionally related proteins interact in diverse ways to carry out biological processes, and each protein often participates in multiple pathways. Proteins are therefore organized into a complex network through which different functions of the cell are carried out. An accurate description of such a network is invaluable to our understanding of both the system-level features of a cell and those of an individual biological process. In this study, we used a probabilistic model to combine information from diverse genome-scale studies as well as individual investigations to generate a global functional network for mouse. Our analysis of the global topology of this network reveals biologically relevant systems-level characteristics of the mouse proteome, including conservation of functional neighborhoods and network features characteristic of known disease genes and key transcriptional regulators. We have made this network publicly available for search and dynamic exploration by researchers in the community. Our Web interface enables users to easily generate hypotheses regarding potential functional roles of uncharacterized proteins, investigate possible links between their proteins of interest and disease, and identify new players in specific biological processes.
doi:10.1371/journal.pcbi.1000165
PMCID: PMC2527685  PMID: 18818725
22.  TopNet: a tool for comparing biological sub-networks, correlating protein properties with topological statistics 
Nucleic Acids Research  2004;32(1):328-337.
Biological networks are a topic of great current interest, particularly with the publication of a number of large genome-wide interaction datasets. They are globally characterized by a variety of graph-theoretic statistics, such as the degree distribution, clustering coefficient, characteristic path length and diameter. Moreover, real protein networks are quite complex and can often be divided into many sub-networks through systematic selection of different nodes and edges. For instance, proteins can be sub-divided by expression level, length, amino-acid composition, solubility, secondary structure and function. A challenging research question is to compare the topologies of sub- networks, looking for global differences associated with different types of proteins. TopNet is an automated web tool designed to address this question, calculating and comparing topological characteristics for different sub-networks derived from any given protein network. It provides reasonable solutions to the calculation of network statistics for sub-networks embedded within a larger network and gives simplified views of a sub-network of interest, allowing one to navigate through it. After constructing TopNet, we applied it to the interaction networks and protein classes currently available for yeast. We were able to find a number of potential biological correlations. In particular, we found that soluble proteins had more interactions than membrane proteins. Moreover, amongst soluble proteins, those that were highly expressed, had many polar amino acids, and had many alpha helices, tended to have the most interaction partners. Interestingly, TopNet also turned up some systematic biases in the current yeast interaction network: on average, proteins with a known functional classification had many more interaction partners than those without. This phenomenon may reflect the incompleteness of the experimentally determined yeast interaction network.
doi:10.1093/nar/gkh164
PMCID: PMC373274  PMID: 14724320
23.  Prediction of Human Disease Genes by Human-Mouse Conserved Coexpression Analysis 
PLoS Computational Biology  2008;4(3):e1000043.
Background
Even in the post-genomic era, the identification of candidate genes within loci associated with human genetic diseases is a very demanding task, because the critical region may typically contain hundreds of positional candidates. Since genes implicated in similar phenotypes tend to share very similar expression profiles, high throughput gene expression data may represent a very important resource to identify the best candidates for sequencing. However, so far, gene coexpression has not been used very successfully to prioritize positional candidates.
Methodology/Principal Findings
We show that it is possible to reliably identify disease-relevant relationships among genes from massive microarray datasets by concentrating only on genes sharing similar expression profiles in both human and mouse. Moreover, we show systematically that the integration of human-mouse conserved coexpression with a phenotype similarity map allows the efficient identification of disease genes in large genomic regions. Finally, using this approach on 850 OMIM loci characterized by an unknown molecular basis, we propose high-probability candidates for 81 genetic diseases.
Conclusion
Our results demonstrate that conserved coexpression, even at the human-mouse phylogenetic distance, represents a very strong criterion to predict disease-relevant relationships among human genes.
Author Summary
One of the most limiting aspects of biological research in the post-genomic era is the capability to integrate massive datasets on gene structure and function for producing useful biological knowledge. In this report we have applied an integrative approach to address the problem of identifying likely candidate genes within loci associated with human genetic diseases. Despite the recent progress in sequencing technologies, approaching this problem from an experimental perspective still represents a very demanding task, because the critical region may typically contain hundreds of positional candidates. We found that by concentrating only on genes sharing similar expression profiles in both human and mouse, massive microarray datasets can be used to reliably identify disease-relevant relationships among genes. Moreover, we found that integrating the coexpression criterion with systematic phenome analysis allows efficient identification of disease genes in large genomic regions. Using this approach on 850 OMIM loci characterized by unknown molecular basis, we propose high-probability candidates for 81 genetic diseases.
doi:10.1371/journal.pcbi.1000043
PMCID: PMC2268251  PMID: 18369433
24.  Coexpression of Linked Genes in Mammalian Genomes Is Generally Disadvantageous 
Molecular Biology and Evolution  2008;25(8):1555-1565.
Similarity in gene expression pattern between closely linked genes is known in several eukaryotes. Two models have been proposed to explain the presence of such coexpression patterns. The adaptive model assumes that coexpression is advantageous and is established by relocation of initially unlinked but coexpressed genes, whereas the neutral model asserts that coexpression is a type of leaky expression due to similar expressional environments of linked genes, but is neither advantageous nor detrimental. However, these models are incompatible with several empirical observations. Here, we propose that coexpression of linked genes is a form of transcriptional interference that is disadvantageous to the organism. We show that even distantly linked genes that are tens of megabases away exhibit significant coexpression in the human genome. However, the linkage is more likely to be broken during evolution between genes of high coexpression than those of low coexpression and the breakage of linkage reduces gene coexpression. These results support our hypothesis that coexpression of linked genes in mammalian genomes is generally disadvantageous, implying that many mammalian genes may never reach their optimal expression pattern due to the interference of their genomic environment and that such transcriptional interference may be a force promoting recurrent relocation of genes in the genome.
doi:10.1093/molbev/msn101
PMCID: PMC2734128  PMID: 18440951
gene order; linkage; gene expression; coexpression; evolution; mammals
25.  GraphCrunch: A tool for large network analyses 
BMC Bioinformatics  2008;9:70.
Background
The recent explosion in biological and other real-world network data has created the need for improved tools for large network analyses. In addition to well established global network properties, several new mathematical techniques for analyzing local structural properties of large networks have been developed. Small over-represented subgraphs, called network motifs, have been introduced to identify simple building blocks of complex networks. Small induced subgraphs, called graphlets, have been used to develop "network signatures" that summarize network topologies. Based on these network signatures, two new highly sensitive measures of network local structural similarities were designed: the relative graphlet frequency distance (RGF-distance) and the graphlet degree distribution agreement (GDD-agreement).
Finding adequate null-models for biological networks is important in many research domains. Network properties are used to assess the fit of network models to the data. Various network models have been proposed. To date, there does not exist a software tool that measures the above mentioned local network properties. Moreover, none of the existing tools compare real-world networks against a series of network models with respect to these local as well as a multitude of global network properties.
Results
Thus, we introduce GraphCrunch, a software tool that finds well-fitting network models by comparing large real-world networks against random graph models according to various network structural similarity measures. It has unique capabilities of finding computationally expensive RGF-distance and GDD-agreement measures. In addition, it computes several standard global network measures and thus supports the largest variety of network measures thus far. Also, it is the first software tool that compares real-world networks against a series of network models and that has built-in parallel computing capabilities allowing for a user specified list of machines on which to perform compute intensive searches for local network properties. Furthermore, GraphCrunch is easily extendible to include additional network measures and models.
Conclusion
GraphCrunch is a software tool that implements the latest research on biological network models and properties: it compares real-world networks against a series of random graph models with respect to a multitude of local and global network properties. We present GraphCrunch as a comprehensive, parallelizable, and easily extendible software tool for analyzing and modeling large biological networks. The software is open-source and freely available at . It runs under Linux, MacOS, and Windows Cygwin. In addition, it has an easy to use on-line web user interface that is available from the above web page.
doi:10.1186/1471-2105-9-70
PMCID: PMC2275247  PMID: 18230190

Results 1-25 (668224)