Search tips
Search criteria

Results 1-25 (85)

Clipboard (0)

Select a Filter Below

Year of Publication
more »
1.  Gene Bionetwork Analysis of Ovarian Primordial Follicle Development 
PLoS ONE  2010;5(7):e11637.
Ovarian primordial follicles are critical for female reproduction and comprise a finite pool of gametes arrested in development. A systems biology approach was used to identify regulatory gene networks essential for primordial follicle development. Transcriptional responses to eight different growth factors known to influence primordial follicles were used to construct a bionetwork of regulatory genes involved in rat primordial follicle development. Over 1,500 genes were found to be regulated by the various growth factors and a network analysis identified critical gene modules involved in a number of signaling pathways and cellular processes. A set of 55 genes was identified as potential critical regulators of these gene modules, and a sub-network associated with development was determined. Within the network two previously identified regulatory genes were confirmed (i.e., Pdgfa and Fgfr2) and a new factor was identified, connective tissue growth factor (CTGF). CTGF was tested in ovarian organ cultures and found to stimulate primordial follicle development. Therefore, the relevant gene network associated with primordial follicle development was validated and the critical genes and pathways involved in this process were identified. This is one of the first applications of network analysis to a normal developmental process. These observations provide insights into potential therapeutic targets for preventing ovarian disease and promoting female reproduction.
PMCID: PMC2905436  PMID: 20661288
2.  CR1 and the “Vanishing Amyloid” Hypothesis of Alzheimer’s Disease 
Biological psychiatry  2013;73(5):393-395.
PMCID: PMC3600375  PMID: 23399469
3.  A Systems Genetic Analysis of High Density Lipoprotein Metabolism and Network Preservation across Mouse Models 
Biochimica et Biophysica Acta  2011;1821(3):435-447.
We report a systems genetics analysis of high density lipoproteins (HDL) levels in an F2 intercross between inbred strains CAST/EiJ and C57BL/6J. We previously showed that there are dramatic differences in HDL metabolism in a cross between these strains, and we now report co-expression network analysis of HDL that integrates global expression data from liver and adipose with relevant metabolic traits. Using data from a total of 293 F2 intercross mice, we constructed weighted gene co-expression networks and identified modules (subnetworks) associated with HDL and clinical traits. These were examined for genes implicated in HDL levels based on large human genome-wide associations studies (GWAS) and examined with respect to conservation between tissue and sexes in a total of 9 data sets. We identify genes that are consistently ranked high by association with HDL across the 9 data sets. We focus in particular on two genes, Wfdc2 and Hdac3, that are located in close proximity to HDL QTL peaks where causal testing indicates that they may affect HDL. Our results provide a rich resource for studies of complex metabolic interactions involving HDL.
PMCID: PMC3265689  PMID: 21807117
4.  The Arrestin Domain Containing 3 (ARRDC3) Protein Regulates Body Mass and Energy Expenditure 
Cell metabolism  2011;14(5):671-683.
A human genome-wide linkage scan for obesity identified a linkage peak on chromosome 5q13–15. Positional cloning revealed an association of a rare haplotype to high body-mass index (BMI) in males but not females. The risk locus contains a single gene, “arrestin domain containing 3” (ARRDC3), an uncharacterized α-arrestin. Inactivating Arrdc3 in mice led to a striking resistance to obesity, with greater impact on male mice. Mice with decreased ARRDC3 levels were protected from obesity due to increased energy expenditure through increased activity levels and increased thermogenesis of both brown and white adipose tissues. ARRDC3 interacted directly with β-adrenergic receptors, and loss of ARRDC3 increased the response to β-adrenergic stimulation in isolated adipose tissue. These results demonstrate that ARRDC3 is a gender-sensitive regulator of obesity and energy expenditure and reveal a surprising diversity for arrestin family protein functions.
PMCID: PMC3216113  PMID: 21982743
5.  The changing privacy landscape in the era of big data 
PMCID: PMC3472686  PMID: 22968446
6.  Integrating external biological knowledge in the construction of regulatory networks from time-series expression data 
BMC Systems Biology  2012;6:101.
Inference about regulatory networks from high-throughput genomics data is of great interest in systems biology. We present a Bayesian approach to infer gene regulatory networks from time series expression data by integrating various types of biological knowledge.
We formulate network construction as a series of variable selection problems and use linear regression to model the data. Our method summarizes additional data sources with an informative prior probability distribution over candidate regression models. We extend the Bayesian model averaging (BMA) variable selection method to select regulators in the regression framework. We summarize the external biological knowledge by an informative prior probability distribution over the candidate regression models.
We demonstrate our method on simulated data and a set of time-series microarray experiments measuring the effect of a drug perturbation on gene expression levels, and show that it outperforms leading regression-based methods in the literature.
PMCID: PMC3465231  PMID: 22898396
Systems biology; Network inference; Data integration; Statistics; Time-series expression data; Model uncertainty
7.  This I Believe: Gaining New Insights Through Integrating “Old” Data 
Frontiers in Genetics  2012;3:137.
PMCID: PMC3442492  PMID: 23024647
8.  Validation of ITD mutations in FLT3 as a therapeutic target in human acute myeloid leukaemia 
Nature  2012;485(7397):260-263.
Effective targeted cancer therapeutic development depends upon distinguishing disease-associated ‘driver’ mutations, which have causative roles in malignancy pathogenesis, from ‘passenger’ mutations, which are dispensable for cancer initiation and maintenance. Translational studies of clinically active targeted therapeutics can definitively discriminate driver from passenger lesions and provide valuable insights into human cancer biology. Activating internal tandem duplication (ITD) mutations in FLT3 (FLT3-ITD) are detected in approximately 20% of acute myeloid leukaemia (AML) patients and are associated with a poor prognosis1. Abundant scientific2 and clinical evidence1,3, including the lack of convincing clinical activity of early FLT3 inhibitors4,5, suggests that FLT3-ITD probably represents a passenger lesion. Here we report point mutations at three residues within the kinase domain of FLT3-ITD that confer substantial in vitro resistance to AC220 (quizartinib), an active investigational inhibitor of FLT3, KIT, PDGFRA, PDGFRB and RET6,7; evolution of AC220-resistant substitutions at two of these amino acid positions was observed in eight of eight FLT3-ITD-positive AML patients with acquired resistance to AC220. Our findings demonstrate that FLT3-ITD can represent a driver lesion and valid therapeutic target in human AML. AC220-resistant FLT3 kinase domain mutants represent high-value targets for future FLT3 inhibitor development efforts.
PMCID: PMC3390926  PMID: 22504184
9.  COSINE: COndition-SpecIfic sub-NEtwork identification using a global optimization method 
Bioinformatics  2011;27(9):1290-1298.
Motivation: The identification of condition specific sub-networks from gene expression profiles has important biological applications, ranging from the selection of disease-related biomarkers to the discovery of pathway alterations across different phenotypes. Although many methods exist for extracting these sub-networks, very few existing approaches simultaneously consider both the differential expression of individual genes and the differential correlation of gene pairs, losing potentially valuable information in the data.
Results: In this article, we propose a new method, COSINE (COndition SpecIfic sub-NEtwork), which employs a scoring function that jointly measures the condition-specific changes of both ‘nodes’ (individual genes) and ‘edges’ (gene–gene co-expression). It uses the genetic algorithm to search for the single optimal sub-network which maximizes the scoring function. We applied COSINE to both simulated datasets with various differential expression patterns, and three real datasets, one prostate cancer dataset, a second one from the across-tissue comparison of morbidly obese patients and the other from the across-population comparison of the HapMap samples. Compared with previous methods, COSINE is more powerful in identifying truly significant sub-networks of appropriate size and meaningful biological relevance.
Availability: The R code is available as the COSINE package on CRAN:
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3138081  PMID: 21414987
10.  Stitching together Multiple Data Dimensions Reveals Interacting Metabolomic and Transcriptomic Networks That Modulate Cell Regulation 
PLoS Biology  2012;10(4):e1001301.
DNA variation can be used as a systematic source of perturbation in segregating populations as a way to infer regulatory networks via the integration of large-scale, high-dimensional molecular profiling data.
Cells employ multiple levels of regulation, including transcriptional and translational regulation, that drive core biological processes and enable cells to respond to genetic and environmental changes. Small-molecule metabolites are one category of critical cellular intermediates that can influence as well as be a target of cellular regulations. Because metabolites represent the direct output of protein-mediated cellular processes, endogenous metabolite concentrations can closely reflect cellular physiological states, especially when integrated with other molecular-profiling data. Here we develop and apply a network reconstruction approach that simultaneously integrates six different types of data: endogenous metabolite concentration, RNA expression, DNA variation, DNA–protein binding, protein–metabolite interaction, and protein–protein interaction data, to construct probabilistic causal networks that elucidate the complexity of cell regulation in a segregating yeast population. Because many of the metabolites are found to be under strong genetic control, we were able to employ a causal regulator detection algorithm to identify causal regulators of the resulting network that elucidated the mechanisms by which variations in their sequence affect gene expression and metabolite concentrations. We examined all four expression quantitative trait loci (eQTL) hot spots with colocalized metabolite QTLs, two of which recapitulated known biological processes, while the other two elucidated novel putative biological mechanisms for the eQTL hot spots.
Author Summary
It is now possible to score variations in DNA across whole genomes, RNA levels and alternative isoforms, metabolite levels, protein levels and protein state information, protein–protein interactions, and protein–DNA interactions, in a comprehensive fashion in populations of individuals. Interactions among these molecular entities define the complex web of biological processes that give rise to all higher order phenotypes, including disease. The development of analytical approaches that simultaneously integrate different dimensions of data is essential if we are to extract the meaning from large-scale data to elucidate the complexity of living systems. Here, we use a novel Bayesian network reconstruction algorithm that simultaneously integrates DNA variation, RNA levels, metabolite levels, protein–protein interaction data, protein–DNA binding data, and protein–small-molecule interaction data to construct molecular networks in yeast. We demonstrate that these networks can be used to infer causal relationships among genes, enabling the identification of novel genes that modulate cellular regulation. We show that our network predictions either recapitulate known biology or can be prospectively validated, demonstrating a high degree of accuracy in the predicted network.
PMCID: PMC3317911  PMID: 22509135
11.  Coanalysis of GWAS with eQTLs reveals disease-tissue associations 
Expression quantitative trait loci (eQTL), or genetic variants associated with changes in gene expression, have the potential to assist in interpreting results of genome-wide association studies (GWAS). eQTLs also have varying degrees of tissue specificity. By correlating the statistical significance of eQTLs mapped in various tissue types to their odds ratios reported in a large GWAS by the Wellcome Trust Case Control Consortium (WTCCC), we discovered that there is a significant association between diseases studied genetically and their relevant tissues. This suggests that eQTL data sets can be used to determine tissues that play a role in the pathogenesis of a disease, thereby highlighting these tissue types for further post-GWAS functional studies.
PMCID: PMC3392070  PMID: 22779046
13.  Dissecting Cis Regulation of Gene Expression in Human Metabolic Tissues 
PLoS ONE  2011;6(8):e23480.
Complex diseases such as obesity and type II diabetes can result from a failure in multiple organ systems including the central nervous system and tissues involved in partitioning and disposal of nutrients. Studying the genetics of gene expression in tissues that are involved in the development of these diseases can provide insights into how these tissues interact within the context of disease. Expression quantitative trait locus (eQTL) studies identify mRNA expression changes linked to proximal genetic signals (cis eQTLs) that have been shown to affect disease. Given the high impact of recent eQTL studies, it is important to understand what role sample size and environment plays in identification of cis eQTLs. Here we show in a genotyped obese human population that the number of cis eQTLs obey precise scaling laws as a function of sample size in three profiled tissues, i.e. omental adipose, subcutaneous adipose and liver. Also, we show that genes (or transcripts) with cis eQTL associations detected in a small population are detected at approximately 90% rate in the largest population available for our study, indicating that genes with strong cis acting regulatory elements can be identified with relatively high confidence in smaller populations. However, by increasing the sample size we allow for better detection of weaker and more distantly located cis-regulatory elements. Yet, we determined that the number of tissue specific cis eQTLs saturates in a modestly sized cohort while the number of cis eQTLs common to all tissues fails to reach a maximum value. Understanding the power laws that govern the number and specificity of eQTLs detected in different tissues, will allow a better utilization of genetics of gene expression to inform the molecular mechanism underlying complex disease traits.
PMCID: PMC3166146  PMID: 21912597
14.  Computational solutions to large-scale data management and analysis 
Nature reviews. Genetics  2010;11(9):647-657.
Today we can generate hundreds of gigabases of DNA and RNA sequencing data in a week for less than US$5,000. The astonishing rate of data generation by these low-cost, high-throughput technologies in genomics is being matched by that of other technologies, such as real-time imaging and mass spectrometry-based flow cytometry. Success in the life sciences will depend on our ability to properly interpret the large-scale, high-dimensional data sets that are generated by these technologies, which in turn requires us to adopt advances in informatics. Here we discuss how we can master the different types of computational environments that exist — such as cloud and heterogeneous computing — to successfully tackle our big data problems.
PMCID: PMC3124937  PMID: 20717155
15.  Characterizing the role of miRNAs within gene regulatory networks using integrative genomics techniques 
By integrating genotype information, microRNA transcript abundances and mRNA expression levels, Eric Schadt and colleagues provide insights into the genetic basis of microRNA gene expression and the role of microRNAs within the liver gene-regulatory network.
This article demonstrates how integrative genomics techniques can be used to investigate novel classes of RNA molecules. Moreover, it represents one of the first examinations of the genetic basis of variation in miRNA gene expression.Our results suggest that miRNA transcript abundances are under more complex regulation than previously observed for mRNA abundances.We also demonstrate that miRNAs typically exist as highly connected hub nodes and function as key sensors within the liver transcriptional network.Additionally, our results provide support for two key hypotheses—namely, that miRNAs can act cooperatively or redundantly to regulate a given pathway, and that miRNAs play a subtle role by dampening expression of their target gene through the use of feedback loops.
Since their discovery less than two decades ago, microRNAs (miRNAs) have repeatedly been shown to play a regulatory role in important biological processes. These small single-stranded molecules have been found to regulate multiple pathways—such as developmental timing in worms; fat metabolism in flies; and stress response in plants—and have been established as key regulatory molecules with potential widespread influence on both fundamental biology and various diseases. In the past decade, a new approach referred to by a number of names (‘integrative genomics', ‘systems genetics' or ‘genetical genomics') has shown increasing levels of success in elucidating the complex relationships found in gene regulatory networks. This approach leverages multiple layers of information (such as genotype, gene expression and phenotype) to infer causal associations that are then used for a number of different purposes, including identifying drivers of diseases and characterizing molecular networks. More importantly, many of the causal relationships that have been identified using this approach have been experimentally tested and verified. By integrating miRNA transcript abundances with messenger RNA (mRNA) expression data and genetic data, we have demonstrated how integrative genomics approaches can be used to characterize the global role played by miRNAs within complex gene regulatory networks. Overall, we investigated approximately 30% of the registered mouse miRNAs with a focus on liver networks. Our analysis reveals that miRNAs exist as highly connected hub nodes and function as key sensors within the gene regulatory network. Further comparisons between the regulatory loci contributing to the variation observed in miRNA and mRNA expression levels indicate that while miRNAs are controlled by more loci than have previously been observed for mRNAs, the contribution from each locus is on average smaller for miRNAs. We also provide evidence supporting two key hypotheses in the field: (i) miRNAs can act cooperatively or redundantly to regulate a given pathway; and (ii) miRNAs may regulate expression of their target gene through the use of feedback loops.
Integrative genomics and genetics approaches have proven to be a useful tool in elucidating the complex relationships often found in gene regulatory networks. More importantly, a number of studies have provided the necessary experimental evidence confirming the validity of the causal relationships inferred using such an approach. By integrating messenger RNA (mRNA) expression data with microRNA (miRNA) (i.e. small non-coding RNA with well-established regulatory roles in a myriad of biological processes) expression data, we show how integrative genomics approaches can be used to characterize the role played by approximately a third of registered mouse miRNAs within the context of a liver gene regulatory network. Our analysis reveals that the transcript abundances of miRNAs are subject to regulatory control by many more loci than previously observed for mRNA expression. Moreover, our results indicate that miRNAs exist as highly connected hub-nodes and function as key sensors within the transcriptional network. We also provide evidence supporting the hypothesis that miRNAs can act cooperatively or redundantly to regulate a given pathway and that miRNAs play a subtle role by dampening expression of their target gene through the use of feedback loops.
PMCID: PMC3130556  PMID: 21613979
causal associations; eQTL mapping; expression QTL; microRNA
16.  Systematic Detection of Polygenic cis-Regulatory Evolution 
PLoS Genetics  2011;7(3):e1002023.
The idea that most morphological adaptations can be attributed to changes in the cis-regulation of gene expression levels has been gaining increasing acceptance, despite the fact that only a handful of such cases have so far been demonstrated. Moreover, because each of these cases involves only one gene, we lack any understanding of how natural selection may act on cis-regulation across entire pathways or networks. Here we apply a genome-wide test for selection on cis-regulation to two subspecies of the mouse Mus musculus. We find evidence for lineage-specific selection at over 100 genes involved in diverse processes such as growth, locomotion, and memory. These gene sets implicate candidate genes that are supported by both quantitative trait loci and a validated causality-testing framework, and they predict a number of phenotypic differences, which we confirm in all four cases tested. Our results suggest that gene expression adaptation is widespread and that these adaptations can be highly polygenic, involving cis-regulatory changes at numerous functionally related genes. These coordinated adaptations may contribute to divergence in a wide range of morphological, physiological, and behavioral phenotypes.
Author Summary
Evolution can involve changes that are advantageous—known as adaptations—as well as changes that are neutral or slightly deleterious, which are established through a process of random drift. Discerning what specific differences between any two lineages are adaptive is a major goal of evolutionary biology. For gene expression differences, this has traditionally proven to be a challenging question, and previous studies of gene expression adaptation in metazoans have been restricted to the single-gene level. Here we present a genome-wide analysis of gene expression evolution in two subspecies of the mouse Mus musculus. We find several groups of genes that have likely been subject to selection for up-regulation in a specific lineage. These groups include genes related to mitochondria, growth, locomotion, and memory. Analysis of the phenotypes of these mice indicates that these adaptations may have had a significant impact on a wide range of phenotypes.
PMCID: PMC3069120  PMID: 21483757
17.  From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus 
Nature  2010;466(7307):714-719.
Recent genome-wide association studies (GWASs) have identified a locus on chromosome 1p13 as strongly associated with both serum low-density lipoprotein cholesterol (LDL-C) and myocardial infarction (MI) in humans. Here we show through a series of studies in human cohorts and human-derived hepatocytes that a common noncoding polymorphism at the 1p13 locus, rs12740374, creates a C/EBP transcription factor binding site and alters the hepatic expression of the SORT1 gene. With siRNA knockdown and viral overexpression in mouse liver, we demonstrate that Sort1 alters plasma LDL-C and very low-density lipoprotein (VLDL) particle levels by modulating hepatic VLDL secretion. Thus, we provide functional evidence for a novel regulatory pathway for lipoprotein metabolism and suggest that modulation of this pathway may alter risk for MI in humans. We also demonstrate that common noncoding DNA variants identified by GWASs can directly contribute to clinical phenotypes.
PMCID: PMC3062476  PMID: 20686566
18.  Common body mass index-associated variants confer risk of extreme obesity 
Human Molecular Genetics  2009;18(18):3502-3507.
To investigate the genetic architecture of severe obesity, we performed a genome-wide association study of 775 cases and 3197 unascertained controls at ∼550 000 markers across the autosomal genome. We found convincing association to the previously described locus including the FTO gene. We also found evidence of association at a further six of 12 other loci previously reported to influence body mass index (BMI) in the general population and one of three associations to severe childhood and adult obesity and that cases have a higher proportion of risk-conferring alleles than controls. We found no evidence of homozygosity at any locus due to identity-by-descent associating with phenotype which would be indicative of rare, penetrant alleles, nor was there excess genome-wide homozygosity in cases relative to controls. Our results suggest that variants influencing BMI also contribute to severe obesity, a condition at the extreme of the phenotypic spectrum rather than a distinct condition.
PMCID: PMC2729668  PMID: 19553259
19.  An Integrative Multi-Network and Multi-Classifier Approach to Predict Genetic Interactions 
PLoS Computational Biology  2010;6(9):e1000928.
Genetic interactions occur when a combination of mutations results in a surprising phenotype. These interactions capture functional redundancy, and thus are important for predicting function, dissecting protein complexes into functional pathways, and exploring the mechanistic underpinnings of common human diseases. Synthetic sickness and lethality are the most studied types of genetic interactions in yeast. However, even in yeast, only a small proportion of gene pairs have been tested for genetic interactions due to the large number of possible combinations of gene pairs. To expand the set of known synthetic lethal (SL) interactions, we have devised an integrative, multi-network approach for predicting these interactions that significantly improves upon the existing approaches. First, we defined a large number of features for characterizing the relationships between pairs of genes from various data sources. In particular, these features are independent of the known SL interactions, in contrast to some previous approaches. Using these features, we developed a non-parametric multi-classifier system for predicting SL interactions that enabled the simultaneous use of multiple classification procedures. Several comprehensive experiments demonstrated that the SL-independent features in conjunction with the advanced classification scheme led to an improved performance when compared to the current state of the art method. Using this approach, we derived the first yeast transcription factor genetic interaction network, part of which was well supported by literature. We also used this approach to predict SL interactions between all non-essential gene pairs in yeast ( This integrative approach is expected to be more effective and robust in uncovering new genetic interactions from the tens of millions of unknown gene pairs in yeast and from the hundreds of millions of gene pairs in higher organisms like mouse and human, in which very few genetic interactions have been identified to date.
Author Summary
Genetic interactions occur when a combination of mutations results in a surprising phenotype. These interactions capture functional redundancy, and thus are important for predicting function, dissecting protein complexes into functional pathways, and exploring the mechanistic underpinnings of common human diseases. Due to the large number of possible combinations of genes, only a small portion of gene pairs in yeast and only a few pairs in higher organisms like mouse and human have been tested. Therefore, predicting genetic interactions has received significant attention in the past several years. The existing methods primarily rely on the known genetic interactions, and thus are far less effective in classifying most gene pairs not well connected with known genetic interactions. Here we developed a non-parametric multi-classifier system for predicting genetic interactions based on a large number of novel features independent of the known genetic interactions. This approach led to an improved performance when compared to the current state-of-the-art method. Using this approach, we derived the first yeast transcription factor genetic interaction network, part of which was well supported by literature. This integrative approach is expected to be more effective and robust in uncovering new genetic interactions in yeast and other species.
PMCID: PMC2936518  PMID: 20838583
20.  The C3a Anaphylatoxin Receptor Is a Key Mediator of Insulin Resistance and Functions by Modulating Adipose Tissue Macrophage Infiltration and Activation 
Diabetes  2009;58(9):2006-2017.
Significant new data suggest that metabolic disorders such as diabetes, obesity, and atherosclerosis all posses an important inflammatory component. Infiltrating macrophages contribute to both tissue-specific and systemic inflammation, which promotes insulin resistance. The complement cascade is involved in the inflammatory cascade initiated by the innate and adaptive immune response. A mouse genomic F2 cross biology was performed and identified several causal genes linked to type 2 diabetes, including the complement pathway.
We therefore sought to investigate the effect of a C3a receptor (C3aR) deletion on insulin resistance, obesity, and macrophage function utilizing both the normal-diet (ND) and a diet-induced obesity mouse model.
We demonstrate that high C3aR expression is found in white adipose tissue and increases upon high-fat diet (HFD) feeding. Both adipocytes and macrophages within the white adipose tissue express significant amounts of C3aR. C3aR−/− mice on HFD are transiently resistant to diet-induced obesity during an 8-week period. Metabolic profiling suggests that they are also protected from HFD-induced insulin resistance and liver steatosis. C3aR−/− mice had improved insulin sensitivity on both ND and HFD as seen by an insulin tolerance test and an oral glucose tolerance test. Adipose tissue analysis revealed a striking decrease in macrophage infiltration with a concomitant reduction in both tissue and plasma proinflammatory cytokine production. Furthermore, C3aR−/− macrophages polarized to the M1 phenotype showed a considerable decrease in proinflammatory mediators.
Overall, our results suggest that the C3aR in macrophages, and potentially adipocytes, plays an important role in adipose tissue homeostasis and insulin resistance.
PMCID: PMC2731537  PMID: 19581423
21.  Population Genomic Analysis of ALMS1 in Humans Reveals a Surprisingly Complex Evolutionary History 
Molecular Biology and Evolution  2009;26(6):1357-1367.
Mutations in the human gene ALMS1 result in Alström Syndrome, which presents with early childhood obesity and insulin resistance leading to Type 2 diabetes. Previous genomewide scans for selection in the HapMap data based on linkage disequilibrium and population structure suggest that ALMS1 was subject to recent positive selection. Through a detailed population genomic analysis of existing genomewide data sets and new resequencing data obtained in geographically diverse populations, we find that the signature of selection at ALMS1 is considerably more complex than what would be expected for an idealized model of a selective sweep acting on a newly arisen advantageous mutation. Specifically, we observed three highly divergent and globally dispersed haplogroups, two of which carry a set of seven derived nonsynonymous single nucleotide polymorphisms that are nearly fixed in Asian populations. Our data suggest that the interaction of human demographic history and positive selection on standing variation in Eurasian populations approximately 15 thousand years ago parsimoniously explains the spectrum of extant ALMS1 variation. These results provide new insights into the evolutionary history of ALMS1 in humans and suggest that selective events identified in genomewide scans may be more complex than currently appreciated.
PMCID: PMC2734137  PMID: 19279085
ALMS1; positive selection; standing variation
22.  Simultaneous Clustering of Multiple Gene Expression and Physical Interaction Datasets 
PLoS Computational Biology  2010;6(4):e1000742.
Many genome-wide datasets are routinely generated to study different aspects of biological systems, but integrating them to obtain a coherent view of the underlying biology remains a challenge. We propose simultaneous clustering of multiple networks as a framework to integrate large-scale datasets on the interactions among and activities of cellular components. Specifically, we develop an algorithm JointCluster that finds sets of genes that cluster well in multiple networks of interest, such as coexpression networks summarizing correlations among the expression profiles of genes and physical networks describing protein-protein and protein-DNA interactions among genes or gene-products. Our algorithm provides an efficient solution to a well-defined problem of jointly clustering networks, using techniques that permit certain theoretical guarantees on the quality of the detected clustering relative to the optimal clustering. These guarantees coupled with an effective scaling heuristic and the flexibility to handle multiple heterogeneous networks make our method JointCluster an advance over earlier approaches. Simulation results showed JointCluster to be more robust than alternate methods in recovering clusters implanted in networks with high false positive rates. In systematic evaluation of JointCluster and some earlier approaches for combined analysis of the yeast physical network and two gene expression datasets under glucose and ethanol growth conditions, JointCluster discovers clusters that are more consistently enriched for various reference classes capturing different aspects of yeast biology or yield better coverage of the analysed genes. These robust clusters, which are supported across multiple genomic datasets and diverse reference classes, agree with known biology of yeast under these growth conditions, elucidate the genetic control of coordinated transcription, and enable functional predictions for a number of uncharacterized genes.
Author Summary
The generation of high-dimensional datasets in the biological sciences has become routine (protein interaction, gene expression, and DNA/RNA sequence data, to name a few), stretching our ability to derive novel biological insights from them, with even less effort focused on integrating these disparate datasets available in the public domain. Hence a most pressing problem in the life sciences today is the development of algorithms to combine large-scale data on different biological dimensions to maximize our understanding of living systems. We present an algorithm for simultaneously clustering multiple biological networks to identify coherent sets of genes (clusters) underlying cellular processes. The algorithm allows theoretical guarantees on the quality of the detected clusters relative to the optimal clusters that are computationally infeasible to find, and could be applied to coexpression, protein interaction, protein-DNA networks, and other network types. When combining multiple physical and gene expression based networks in yeast, the clusters we identify are consistently enriched for reference classes capturing diverse aspects of biology, yield good coverage of the analysed genes, and highlight novel members in well-studied cellular processes.
PMCID: PMC2855327  PMID: 20419151
23.  Variations in DNA elucidate molecular networks that cause disease 
Nature  2008;452(7186):429-435.
Identifying variations in DNA that increase susceptibility to disease is one of the primary aims of genetic studies using a forward genetics approach. However, identification of disease-susceptibility genes by means of such studies provides limited functional information on how genes lead to disease. In fact, in most cases there is an absence of functional information altogether, preventing a definitive identification of the susceptibility gene or genes. Here we develop an alternative to the classic forward genetics approach for dissecting complex disease traits where, instead of identifying susceptibility genes directly affected by variations in DNA, we identify gene networks that are perturbed by susceptibility loci and that in turn lead to disease. Application of this method to liver and adipose gene expression data generated from a segregating mouse population results in the identification of a macrophage-enriched network supported as having a causal relationship with disease traits associated with metabolic syndrome. Three genes in this network, lipoprotein lipase (Lpl), lactamase β (Lactb) and protein phosphatase 1-like (Ppm1l), are validated as previously unknown obesity genes, strengthening the association between this network and metabolic disease traits. Our analysis provides direct experimental support that complex traits such as obesity are emergent properties of molecular networks that are modulated by complex genetic loci and environmental factors.
PMCID: PMC2841398  PMID: 18344982
24.  An integrative genomics approach to infer causal associations between gene expression and disease 
Nature genetics  2005;37(7):710-717.
A key goal of biomedical research is to elucidate the complex network of gene interactions underlying complex traits such as common human diseases. Here we detail a multistep procedure for identifying potential key drivers of complex traits that integrates DNA-variation and gene-expression data with other complex trait data in segregating mouse populations. Ordering gene expression traits relative to one another and relative to other complex traits is achieved by systematically testing whether variations in DNA that lead to variations in relative transcript abundances statistically support an independent, causative or reactive function relative to the complex traits under consideration. We show that this approach can predict transcriptional responses to single gene–perturbation experiments using gene-expression data in the context of a segregating mouse population. We also demonstrate the utility of this approach by identifying and experimentally validating the involvement of three new genes in susceptibility to obesity.
PMCID: PMC2841396  PMID: 15965475
25.  Disruption of the Aortic Elastic Lamina and Medial Calcification Share Genetic Determinants in Mice 
Disruption of the elastic lamina, as an early indicator of aneurysm formation, and vascular calcification frequently occur together in atherosclerotic lesions of humans.
Methods and Results
We now report evidence of shared genetic basis for disruption of the elastic lamina (medial disruption) and medial calcification in an F2 mouse intercross between C57BL/6J and C3H/HeJ on a hyperlipidemic apolipoprotein E (ApoE−/−) null background. We identified 3 quantitative trait loci (QTLs) on chromosomes 6, 13, and 18, which are common to both traits, and 2 additional QTLs for medial calcification on chromosomes 3 and 7. Medial disruption, including severe disruptions leading to aneurysm formation, and medial calcification were highly correlated and occurred concomitantly in the cross. The chromosome 18 locus showed a striking male sex-specificity for both traits. To identify candidate genes, we integrated data from microarray analysis, genetic segregation, and clinical traits. The chromosome 7 locus contains the Abcc6 gene, known to mediate myocardial calcification. Using transgenic complementation, we show that Abcc6 also contributes to aortic medial calcification.
Our data indicate that calcification, though possibly contributory, does not always lead to medial disruption and that in addition to aneurysm formation, medial disruption may be the precursor to calcification.
PMCID: PMC2836127  PMID: 20031637
aneurysm vascular calcification; Abcc6; Alox5; genetics; gene expression

Results 1-25 (85)