Search tips
Search criteria

Results 1-2 (2)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  Simultaneous Clustering of Multiple Gene Expression and Physical Interaction Datasets 
PLoS Computational Biology  2010;6(4):e1000742.
Many genome-wide datasets are routinely generated to study different aspects of biological systems, but integrating them to obtain a coherent view of the underlying biology remains a challenge. We propose simultaneous clustering of multiple networks as a framework to integrate large-scale datasets on the interactions among and activities of cellular components. Specifically, we develop an algorithm JointCluster that finds sets of genes that cluster well in multiple networks of interest, such as coexpression networks summarizing correlations among the expression profiles of genes and physical networks describing protein-protein and protein-DNA interactions among genes or gene-products. Our algorithm provides an efficient solution to a well-defined problem of jointly clustering networks, using techniques that permit certain theoretical guarantees on the quality of the detected clustering relative to the optimal clustering. These guarantees coupled with an effective scaling heuristic and the flexibility to handle multiple heterogeneous networks make our method JointCluster an advance over earlier approaches. Simulation results showed JointCluster to be more robust than alternate methods in recovering clusters implanted in networks with high false positive rates. In systematic evaluation of JointCluster and some earlier approaches for combined analysis of the yeast physical network and two gene expression datasets under glucose and ethanol growth conditions, JointCluster discovers clusters that are more consistently enriched for various reference classes capturing different aspects of yeast biology or yield better coverage of the analysed genes. These robust clusters, which are supported across multiple genomic datasets and diverse reference classes, agree with known biology of yeast under these growth conditions, elucidate the genetic control of coordinated transcription, and enable functional predictions for a number of uncharacterized genes.
Author Summary
The generation of high-dimensional datasets in the biological sciences has become routine (protein interaction, gene expression, and DNA/RNA sequence data, to name a few), stretching our ability to derive novel biological insights from them, with even less effort focused on integrating these disparate datasets available in the public domain. Hence a most pressing problem in the life sciences today is the development of algorithms to combine large-scale data on different biological dimensions to maximize our understanding of living systems. We present an algorithm for simultaneously clustering multiple biological networks to identify coherent sets of genes (clusters) underlying cellular processes. The algorithm allows theoretical guarantees on the quality of the detected clusters relative to the optimal clusters that are computationally infeasible to find, and could be applied to coexpression, protein interaction, protein-DNA networks, and other network types. When combining multiple physical and gene expression based networks in yeast, the clusters we identify are consistently enriched for reference classes capturing diverse aspects of biology, yield good coverage of the analysed genes, and highlight novel members in well-studied cellular processes.
PMCID: PMC2855327  PMID: 20419151
2.  Meta-analysis of Inter-species Liver Co-expression Networks Elucidates Traits Associated with Common Human Diseases 
PLoS Computational Biology  2009;5(12):e1000616.
Co-expression networks are routinely used to study human diseases like obesity and diabetes. Systematic comparison of these networks between species has the potential to elucidate common mechanisms that are conserved between human and rodent species, as well as those that are species-specific characterizing evolutionary plasticity. We developed a semi-parametric meta-analysis approach for combining gene-gene co-expression relationships across expression profile datasets from multiple species. The simulation results showed that the semi-parametric method is robust against noise. When applied to human, mouse, and rat liver co-expression networks, our method out-performed existing methods in identifying gene pairs with coherent biological functions. We identified a network conserved across species that highlighted cell-cell signaling, cell-adhesion and sterol biosynthesis as main biological processes represented in genome-wide association study candidate gene sets for blood lipid levels. We further developed a heterogeneity statistic to test for network differences among multiple datasets, and demonstrated that genes with species-specific interactions tend to be under positive selection throughout evolution. Finally, we identified a human-specific sub-network regulated by RXRG, which has been validated to play a different role in hyperlipidemia and Type 2 diabetes between human and mouse. Taken together, our approach represents a novel step forward in integrating gene co-expression networks from multiple large scale datasets to leverage not only common information but also differences that are dataset-specific.
Author Summary
Two important aspects of drug development are drug target identification and biomarker discovery for early disease detection, disease progression, drug efficacy and drug toxicity, etc. Recently, many single nucleotide polymorphisms (SNPs) associated with human diseases are discovered through large genome-wide association studies (GWAS). However, it is still largely unclear how these candidate SNPs may cause human diseases. The ultimate aim of this paper is to put these GWAS candidate SNPs and their associated genes into a network context to understand their mechanism of action in human diseases. In addition to large-scale human data sets that are often heterogeneous in terms of genetic and environmental factors, many high quality data sets in rodents exist and are frequently used to model human diseases. To leverage such information, we developed a method for combining and contrasting gene networks between human and rodents, specifically to elucidate how GWAS candidate SNPs may contribute to human diseases. By identifying mechanisms that are conserved or divergent between human and rodents, we can also predict which disease causal genes can be studied using rodent models and which ones may not.
PMCID: PMC2787626  PMID: 20019805

Results 1-2 (2)