Thousands of biological and biomedical investigators study of the functional role of single genes and their protein products in normal physiology and in disease. The findings from these studies are reported in research articles that stimulate new research. It is now established that a complex regulatory networks's is controlling human cellular fate, and this community of researchers are continually unraveling this network topology. Attempts to integrate results from such accumulated knowledge resulted in literature-based protein-protein interaction networks (PPINs) and pathway databases. These databases are widely used by the community to analyze new data collected from emerging genome-wide studies with the assumption that the data within these literature-based databases is the ground truth and contain no biases. While suspicion for research focus biases is growing, a concrete proof for it is still missing. It is difficult to prove because the real PPINs are mostly unknown.
Here we analyzed the longitudinal discovery process of literature-based mammalian and yeast PPINs to observe that these networks are discovered non-uniformly. The pattern of discovery is related to a theoretical concept proposed by Kauffman called “expanding the adjacent possible”. We introduce a network discovery model which explicitly includes the space of possibilities in the form of a true underlying PPIN.
Our model strongly suggests that research focus biases exist in the observed discovery dynamics of these networks. In summary, more care should be placed when using PPIN databases for analysis of newly acquired data, and when considering prior knowledge when designing new experiments.
Electronic supplementary material
The online version of this article (doi:10.1186/s12918-015-0173-z) contains supplementary material, which is available to authorized users.
Maintenance of mitochondrial structure and function is critical for preventing podocyte apoptosis and eventual glomerulosclerosis in the kidney; however, the transcription factors that regulate mitochondrial function in podocyte injury remain to be identified. Here, we identified Krüppel-like factor 6 (KLF6), a zinc finger domain transcription factor, as an essential regulator of mitochondrial function in podocyte apoptosis. We observed that podocyte-specific deletion of Klf6 increased the susceptibility of a resistant mouse strain to adriamycin-induced (ADR-induced) focal segmental glomerulosclerosis (FSGS). KLF6 expression was induced early in response to ADR in mice and cultured human podocytes, and prevented mitochondrial dysfunction and activation of intrinsic apoptotic pathways in these podocytes. Promoter analysis and chromatin immunoprecipitation studies revealed that putative KLF6 transcriptional binding sites are present in the promoter of the mitochondrial cytochrome c oxidase assembly gene (SCO2), which is critical for preventing cytochrome c release and activation of the intrinsic apoptotic pathway. Additionally, KLF6 expression was reduced in podocytes from HIV-1 transgenic mice as well as in renal biopsies from patients with HIV-associated nephropathy (HIVAN) and FSGS. Together, these findings indicate that KLF6-dependent regulation of the cytochrome c oxidase assembly gene is critical for maintaining mitochondrial function and preventing podocyte apoptosis.
The process of cellular senescence generates a repressive chromatin environment, however, the role of histone variants and histone proteolytic cleavage in senescence remains unclear. Using models of oncogene-induced and replicative senescence, here we report novel histone H3 tail cleavage events mediated by the protease Cathepsin L. We find that cleaved forms of H3 are nucleosomal and the histone variant H3.3 is the preferred cleaved form of H3. Ectopic expression of H3.3 and its cleavage product (H3.3cs1), which lacks the first twenty-one amino acids of the H3 tail, is sufficient to induce senescence. Further, H3.3cs1 chromatin incorporation is mediated by the HUCA histone chaperone complex. Genome-wide transcriptional profiling revealed that H3.3cs1 facilitates transcriptional silencing of cell cycle regulators including RB/E2F target genes, likely via the permanent removal of H3K4me3. Collectively, our study identifies histone H3.3 and its proteolytically processed forms as key regulators of cellular senescence.
The genetic architecture of autism spectrum disorder involves the interplay of common and rare variation and their impact on hundreds of genes. Using exome sequencing, analysis of rare coding variation in 3,871 autism cases and 9,937 ancestry-matched or parental controls implicates 22 autosomal genes at a false discovery rate (FDR) < 0.05, and a set of 107 autosomal genes strongly enriched for those likely to affect risk (FDR < 0.30). These 107 genes, which show unusual evolutionary constraint against mutations, incur de novo loss-of-function mutations in over 5% of autistic subjects. Many of the genes implicated encode proteins for synaptic, transcriptional, and chromatin remodeling pathways. These include voltage-gated ion channels regulating propagation of action potentials, pacemaking, and excitability-transcription coupling, as well as histone-modifying enzymes and chromatin remodelers, prominently histone post-translational modifications involving lysine methylation/demethylation.
Research on converting one cell type to another will be aided by
systematic mapping of the gene-regulatory networks in mammalian cells.
Podocytes are kidney cells with specialized morphology that is required for glomerular filtration. Diseases, such as diabetes, or drug exposure that causes disruption of the podocyte foot process morphology results in kidney pathophysiology. Proteomic analysis of glomeruli isolated from rats with puromycin-induced kidney disease and control rats indicated that protein kinase A (PKA), which is activated by adenosine 3′,5′-monophosphate (cAMP), is a key regulator of podocyte morphology and function. In podocytes, cAMP signaling activates cAMP response element–binding protein (CREB) to enhance expression of the gene encoding a differentiation marker, synaptopodin, a protein that associates with actin and promotes its bundling. We constructed and experimentally verified a β-adrenergic receptor–driven network with multiple feedback and feedforward motifs that controls CREB activity. To determine how the motifs interacted to regulate gene expression, we mapped multicompartment dynamical models, including information about protein subcellular localization, onto the network topology using Petri net formalisms. These computational analyses indicated that the juxtaposition of multiple feedback and feedforward motifs enabled the prolonged CREB activation necessary for synaptopodin expression and actin bundling. Drug-induced modulation of these motifs in diseased rats led to recovery of normal morphology and physiological function in vivo. Thus, analysis of regulatory motifs using network dynamics can provide insights into pathophysiology that enable predictions for drug intervention strategies to treat kidney disease.
It is thought that monocytes rapidly differentiate to macrophages or dendritic cells (DCs) upon leaving blood. Here we have shown that Ly-6C+ monocytes constitutively trafficked into skin, lung, and lymph nodes (LNs). Entry was unaffected in gnotobiotic mice. Monocytes in resting lung and LN had similar gene expression profiles to blood monocytes, but elevated transcripts of a limited number of genes including cyclo-oxygenase-2 (COX-2) and major histocompatibility complex class II (MHC II), induced by monocyte interaction with endothelium. Parabiosis, bromodoxyuridine (BrdU) pulse-chase analysis, and intranasal instillation of tracers indicated that instead of contributing to resident macrophages in the lung, recruited endogenous monocytes acquired antigen for carriage to draining LNs, a function redundant with DCs though differentiation to DCs did not occur. Thus, monocytes can enter steady state non-lymphoid organs and recirculate to LNs without differentiation to macrophages or DCs, revising a long-held view that monocytes become tissue-resident macrophages by default.
A 30-node signed and directed network responsible for self-renewal and pluripotency of mouse embryonic stem cells (mESCs) was extracted from several ChIP-Seq and knockdown followed by expression prior studies. The underlying regulatory logic among network components was then learned using the initial network topology and single cell gene expression measurements from mESCs cultured in serum/LIF or serum-free 2i/LIF conditions. Comparing the learned network regulatory logic derived from cells cultured in serum/LIF vs. 2i/LIF revealed differential roles for Nanog, Oct4/Pou5f1, Sox2, Esrrb and Tcf3. Overall, gene expression in the serum/LIF condition was more variable than in the 2i/LIF but mostly consistent across the two conditions. Expression levels for most genes in single cells were bimodal across the entire population and this motivated a Boolean modeling approach. In silico predictions derived from removal of nodes from the Boolean dynamical model were validated with experimental single and combinatorial RNA interference (RNAi) knockdowns of selected network components. Quantitative post-RNAi expression level measurements of remaining network components showed good agreement with the in silico predictions. Computational removal of nodes from the Boolean network model was also used to predict lineage specification outcomes. In summary, data integration, modeling, and targeted experiments were used to improve our understanding of the regulatory topology that controls mESC fate decisions as well as to develop robust directed lineage specification protocols.
For this study we first constructed a directed and signed network consisting of 15 pluripotency regulators and 15 lineage commitment markers that extensively interact to regulate mouse embryonic stem cells fate decisions from data available in the public domain. Given the connectivity structure of this network, the underlying regulatory logic was learned using single cell gene expression measurements of mESCs cultured in two different conditions. With connectivity and logic learned, the network was then simulated using a dynamic Boolean logic framework. Such simulations enabled prediction of knockdown effects on the overall activity of the network. Such predictions were validated by single and combinatorial RNA interference experiments followed by expression measurements. Finally, lineage specification outcomes upon single and combinatorial gene knockdowns were predicted for all possible knockdown combinations.
Availability: N2C is freely available at http://www.maayanlab.net/N2C and is open source.
Supplementary data are available at Bioinformatics online.
Definitive hematopoiesis emerges during embryogenesis via an endothelial-to-hematopoietic transition. We attempted to induce this process in mouse fibroblasts by screening a panel of factors for hemogenic activity. We identified a combination of four transcription factors, Gata2, Gfi1b, cFos, and Etv6 that efficiently induces endothelial-like precursor cells with the subsequent appearance of hematopoietic cells. The precursor cells express a human CD34 reporter, Sca1 and Prominin1 within a global endothelial transcription program. Emergent hematopoietic cells possess nascent/specifying hematopoietic stem cell gene expression profiles and cell surface phenotypes. After transgene silencing and reaggreagtion culture the specified cells generate hematopoietic colonies in vitro. Thus, we have shown that a simple combination of transcription factors is sufficient to induce a complex, dynamic and multi-step developmental program in vitro. These findings provide insights into the specification of definitive hemogenesis and a platform for future development of patient-specific stem/progenitor cells as well as more differentiated blood products.
Abuse of heroin and prescription opiate medications has grown to disturbing levels. Opioids mediate their effects through mu opioid receptors (MOR), but minimal information exists regarding MOR-related striatal signaling relevant to the human condition. The striatum is a structure central to reward and habitual behavior and neurobiological changes in this region are thought to underlie the pathophysiology of addiction disorders.
We examined molecular mechanisms related to MOR in postmortem human brain striatal specimens from a homogenous European Caucasian population of heroin abusers and control subjects and in an animal model of heroin self-administration. Expression of ets-like kinase 1 (ELK1) was examined in relation to polymorphism of the MOR gene OPRM1 and drug history.
A characteristic feature of heroin abusers was decreased expression of MOR and extracellular regulated kinase (ERK) signaling networks, concomitant with dysregulation of the downstream transcription factor ELK1. Striatal ELK1 in heroin abusers associated with the polymorphism rs2075572 in OPRM1 in a genotype dose-dependent manner and correlated with documented history of heroin use, an effect reproduced in an animal model that emphasizes a direct relationship between repeated heroin exposure and ELK1 dysregulation. A central role of ELK1 was evidenced by an unbiased whole transcriptome microarray that revealed ~20% of downregulated genes in human heroin abusers are ELK1 targets. Using chromatin immuneprecipitation, we confirmed decreased ELK1 promoter occupancy of the target gene Use1.
ELK1 is a potential key transcriptional regulatory factor in striatal disturbances associated with heroin abuse and relevant to genetic mutation of OPRM1.
opioid; self-administration; MAPK; transcriptome; addiction; rat
MYH9 encodes non-muscle myosin heavy chain IIA (NMMHCIIA), the predominant force-generating ATPase in non-muscle cells. Several lines of evidence implicate a role for MYH9 in podocytopathies. However, NMMHCIIA‘s function in podocytes remains unknown. To better understand this function, we performed immuno-precipitation followed by mass-spectrometry proteomics to identify proteins interacting with the NMMHCIIA-enriched actin-myosin complexes. Computational analyses revealed that these proteins belong to functional networks including regulators of cytoskeletal organization, metabolism and networks regulated by the HIV-1 gene nef. We further characterized the subcellular localization of NMMHCIIA within podocytes in vivo, and found it to be present within the podocyte major foot processes. Finally, we tested the effect of loss of MYH9 expression in podocytes in vitro, and found that it was necessary for cytoskeletal organization. Our results provide the first survey of NMMHCIIA-enriched actin-myosin-interacting proteins within the podocyte, demonstrating the important role of NMMHCIIA in organizing the elaborate cytoskeleton structure of podocytes. Our characterization of NMMHCIIA’s functions goes beyond the podocyte, providing important insights into its general molecular role.
For the Library of Integrated Network-based Cellular Signatures (LINCS) project many gene expression signatures using the L1000 technology have been produced. The L1000 technology is a cost-effective method to profile gene expression in large scale. LINCS Canvas Browser (LCB) is an interactive HTML5 web-based software application that facilitates querying, browsing and interrogating many of the currently available LINCS L1000 data. LCB implements two compacted layered canvases, one to visualize clustered L1000 expression data, and the other to display enrichment analysis results using 30 different gene set libraries. Clicking on an experimental condition highlights gene-sets enriched for the differentially expressed genes from the selected experiment. A search interface allows users to input gene lists and query them against over 100 000 conditions to find the top matching experiments. The tool integrates many resources for an unprecedented potential for new discoveries in systems biology and systems pharmacology. The LCB application is available at http://www.maayanlab.net/LINCS/LCB. Customized versions will be made part of the http://lincscloud.org and http://lincs.hms.harvard.edu websites.
Despite wide-spread consensus on the need to transform toxicology and risk assessment in order to keep pace with technological and computational changes that have revolutionized the life sciences, there remains much work to be done to achieve the vision of toxicology based on a mechanistic foundation. A workshop was organized to explore one key aspect of this transformation – the development of Pathways of Toxicity (PoT) as a key tool for hazard identification based on systems biology. Several issues were discussed in depth in the workshop: The first was the challenge of formally defining the concept of a PoT as distinct from, but complementary to, other toxicological pathway concepts such as mode of action (MoA). The workshop came up with a preliminary definition of PoT as “A molecular definition of cellular processes shown to mediate adverse outcomes of toxicants”. It is further recognized that normal physiological pathways exist that maintain homeostasis and these, sufficiently perturbed, can become PoT. Second, the workshop sought to define the adequate public and commercial resources for PoT information, including data, visualization, analyses, tools, and use-cases, as well as the kinds of efforts that will be necessary to enable the creation of such a resource. Third, the workshop explored ways in which systems biology approaches could inform pathway annotation, and which resources are needed and available that can provide relevant PoT information to the diverse user communities.
systems toxicology; pathways of toxicity; adverse outcome pathways; in vitro toxicology; human toxome
BACKGROUND & AIMS
In healthy individuals, interactions between intestinal epithelial cells and lamina propria lymphocytes give rise to a population of CD8+ T cells with suppressor functions (Ts cells). Disruption of Ts cell activities can lead to mucosal inflammation. We investigated what factors were required for expansion of the Ts cell population or loss of their activity in patients with Crohn’s disease (CD).
We developed a method to generate Ts cell lines from freshly isolated lamina propria lymphocytes from patients with ulcerative colitis (UC), patients with CD, or healthy individuals (controls). Cells were stimulated with a monoclonal antibody against CD3, interleukin (IL)-7, and IL-15. After 14 days in culture, CD8+T cells were purified and cultured with IL-7 and IL-15. The resulting Ts cells were analyzed for suppressor activity, expression of surface markers, and cytokine secretion profiles. RNA was isolated from the 3 groups of Ts cells and used in microarray analyses.
Ts cells from patients with UC and controls suppressed proliferation of CD4+ T cells; the suppression required cell contact. In contrast, Ts cells from patients with CD had a reduced capacity to suppress CD4+ T-cell proliferation. The difference in suppressive ability was not associated with surface or intracytoplasmic markers or secretion of cytokines. Microarray analysis identified changes in expression of genes regulated by transforming growth factor (TGF)-β that were associated with the suppressive abilities of Ts cells. We found that TGF-β or supernatants from Ts cells of patients with CD reduced the suppressor activity of control Ts cells.
Ts cells isolated from patients with CD have a reduced ability to suppress proliferation of CD4+T cells compared with control Ts cells. TGF-β signaling reduces the suppressor activity of Ts cells.
Regulatory T Cells; Treg Cells; Immune Regulation; Inflammatory Bowel Disease
Identifying differentially expressed genes (DEG) is a fundamental step in studies that perform genome wide expression profiling. Typically, DEG are identified by univariate approaches such as Significance Analysis of Microarrays (SAM) or Linear Models for Microarray Data (LIMMA) for processing cDNA microarrays, and differential gene expression analysis based on the negative binomial distribution (DESeq) or Empirical analysis of Digital Gene Expression data in R (edgeR) for RNA-seq profiling.
Here we present a new geometrical multivariate approach to identify DEG called the Characteristic Direction. We demonstrate that the Characteristic Direction method is significantly more sensitive than existing methods for identifying DEG in the context of transcription factor (TF) and drug perturbation responses over a large number of microarray experiments. We also benchmarked the Characteristic Direction method using synthetic data, as well as RNA-Seq data. A large collection of microarray expression data from TF perturbations (73 experiments) and drug perturbations (130 experiments) extracted from the Gene Expression Omnibus (GEO), as well as an RNA-Seq study that profiled genome-wide gene expression and STAT3 DNA binding in two subtypes of diffuse large B-cell Lymphoma, were used for benchmarking the method using real data. ChIP-Seq data identifying DNA binding sites of the perturbed TFs, as well as known drug targets of the perturbing drugs, were used as prior knowledge silver-standard for validation. In all cases the Characteristic Direction DEG calling method outperformed other methods. We find that when drugs are applied to cells in various contexts, the proteins that interact with the drug-targets are differentially expressed and more of the corresponding genes are discovered by the Characteristic Direction method. In addition, we show that the Characteristic Direction conceptualization can be used to perform improved gene set enrichment analyses when compared with the gene-set enrichment analysis (GSEA) and the hypergeometric test.
The application of the Characteristic Direction method may shed new light on relevant biological mechanisms that would have remained undiscovered by the current state-of-the-art DEG methods. The method is freely accessible via various open source code implementations using four popular programming languages: R, Python, MATLAB and Mathematica, all available at: http://www.maayanlab.net/CD.
De novo loss-of-function (dnLoF) mutations are found twofold more often in autism spectrum disorder (ASD) probands than their unaffected siblings. Multiple independent dnLoF mutations in the same gene implicate the gene in risk and hence provide a systematic, albeit arduous, path forward for ASD genetics. It is likely that using additional non-genetic data will enhance the ability to identify ASD genes.
To accelerate the search for ASD genes, we developed a novel algorithm, DAWN, to model two kinds of data: rare variations from exome sequencing and gene co-expression in the mid-fetal prefrontal and motor-somatosensory neocortex, a critical nexus for risk. The algorithm casts the ensemble data as a hidden Markov random field in which the graph structure is determined by gene co-expression and it combines these interrelationships with node-specific observations, namely gene identity, expression, genetic data and the estimated effect on risk.
Using currently available genetic data and a specific developmental time period for gene co-expression, DAWN identified 127 genes that plausibly affect risk, and a set of likely ASD subnetworks. Validation experiments making use of published targeted resequencing results demonstrate its efficacy in reliably predicting ASD genes. DAWN also successfully predicts known ASD genes, not included in the genetic data used to create the model.
Validation studies demonstrate that DAWN is effective in predicting ASD genes and subnetworks by leveraging genetic and gene expression data. The findings reported here implicate neurite extension and neuronal arborization as risks for ASD. Using DAWN on emerging ASD sequence data and gene expression data from other brain regions and tissues would likely identify novel ASD genes. DAWN can also be used for other complex disorders to identify genes and subnetworks in those disorders.
Autism; Risk prediction; Gene discovery; Weighted gene co-expression network analysis; Network; Hidden Markov random field; Neurite extension; Neuronal arborization
How dermal papilla (DP) niche cells regulate hair follicle progenitors to control hair growth remains unclear. Using Tbx18Cre to target embryonic DP precursors, we ablate the transcription factor Sox2 early and efficiently, resulting in diminished hair shaft outgrowth. We find that DP niche expression of Sox2 controls the migration rate of differentiating hair shaft progenitors. Transcriptional profiling of Sox2 null DPs reveals increased Bmp6 and decreased Bmp inhibitor Sostdc1, a direct Sox2 transcriptional target. Subsequently, we identify upregulated Bmp signaling in knockout hair shaft progenitors and demonstrate that Bmps inhibit cell migration, an effect that can be attenuated by Sostdc1. A shorter and Sox2-negative hair type lacks Sostdc1 in the DP and shows reduced migration and increased Bmp activity of hair shaft progenitors. Collectively, our data identify Sox2 as a key regulator of hair growth that controls progenitor migration by fine-tuning Bmp-mediated mesenchymal-epithelial crosstalk.
Sox2; Dermal papilla; Stem cell niche; Mesenchymal-epithelial interactions; Hair growth; Hair follicle; Gene ablation; Bmp signaling
We assessed tissue macrophage gene expression in different mouse organs. Diversity in gene expression among different populations of macrophages was remarkable. Only a few hundred mRNA transcripts stood out as selectively expressed by macrophages over DCs and many of these were not present in all macrophages. Nonetheless, well-characterized surface markers, including MerTK and FcγR1 (CD64), along with a cluster of novel transcripts were distinctly and universally associated with mature tissue macrophages. TCEF3, C/EBPα, BACH1, and CREG-1 were among the top transcriptional regulators predicted to regulate these core macrophage-associated genes. Other transcription factor mRNAs were strongly associated with single macrophage populations. We further illustrate how these transcripts and the proteins they encode facilitate distinguishing macrophage versus DC identity of less characterized populations of mononuclear phagocytes.
With the widespread use of combination antiretroviral agents, the incidence of HIV-associated nephropathy has decreased. Currently, HIV-infected patients live much longer and often suffer from comorbidities such as diabetes mellitus. Recent epidemiological studies suggest that concurrent HIV infection and diabetes mellitus may have a synergistic effect on the incidence of chronic kidney disease. To address this, we determined whether HIV-1 transgene expression accelerates diabetic kidney injury using a diabetic HIV-1 transgenic (Tg26) murine model. Diabetes was initially induced with low-dose streptozotocin in both Tg26 and the wild-type mice on the C57BL/6 background, which is resistant to classic HIV-associated nephropathy. Although diabetic nephropathy is minimally observed on C57BL/6 background, diabetic Tg26 mice exhibited a significant increase in glomerular injury compared to non-diabetic Tg26 mice or diabetic wild type mice. Validation of microarray gene expression analysis from isolated glomeruli showed a significant up-regulation of pro-inflammatory pathways in the diabetic Tg26 mice. Thus, our study found that expression of HIV-1 genes aggravates diabetic kidney disease
Many signals must be integrated to maintain self-renewal and pluripotency in embryonic stem cells (ESCs) and to enable induced pluripotent stem cell (iPSC) reprogramming. However, the exact molecular regulatory mechanisms remain elusive. To unravel the essential internal and external signals required for sustaining the ESC state, we conducted a short hairpin (sh) RNA screen of 104 ESC-associated phosphoregulators. Depletion of one such molecule, aurora kinase A (Aurka), resulted in compromised self-renewal and consequent differentiation. By integrating global gene expression and computational analyses, we discovered that loss of Aurka leads to up-regulated p53 activity that triggers ESC differentiation. Specifically, Aurka regulates pluripotency through phosphorylation-mediated inhibition of p53-directed ectodermal and mesodermal gene expression. Phosphorylation of p53 not only impairs p53-induced ESC differentiation but also p53-mediated suppression of iPSC reprogramming. Our studies demonstrate an essential role for Aurka-p53 signaling in the regulation of self-renewal, differentiation, and somatic cell reprogramming.
High content studies that profile mouse and human embryonic stem cells (m/hESCs) using various genome-wide technologies such as transcriptomics and proteomics are constantly being published. However, efforts to integrate such data to obtain a global view of the molecular circuitry in m/hESCs are lagging behind. Here, we present an m/hESC-centered database called Embryonic Stem Cell Atlas from Pluripotency Evidence integrating data from many recent diverse high-throughput studies including chromatin immunoprecipitation followed by deep sequencing, genome-wide inhibitory RNA screens, gene expression microarrays or RNA-seq after knockdown (KD) or overexpression of critical factors, immunoprecipitation followed by mass spectrometry proteomics and phosphoproteomics. The database provides web-based interactive search and visualization tools that can be used to build subnetworks and to identify known and novel regulatory interactions across various regulatory layers. The web-interface also includes tools to predict the effects of combinatorial KDs by additive effects controlled by sliders, or through simulation software implemented in MATLAB. Overall, the Embryonic Stem Cell Atlas from Pluripotency Evidence database is a comprehensive resource for the stem cell systems biology community.
Database URL: http://www.maayanlab.net/ESCAPE
Autism spectrum disorders (ASD) are a group of related neurodevelopmental disorders with significant combined prevalence (~1%) and high heritability. Dozens of individually rare genes and loci associated with high-risk for ASD have been identified, which overlap extensively with genes for intellectual disability (ID). However, studies indicate that there may be hundreds of genes that remain to be identified. The advent of inexpensive massively parallel nucleotide sequencing can reveal the genetic underpinnings of heritable complex diseases, including ASD and ID. However, whole exome sequencing (WES) and whole genome sequencing (WGS) provides an embarrassment of riches, where many candidate variants emerge. It has been argued that genetic variation for ASD and ID will cluster in genes involved in distinct pathways and protein complexes. For this reason, computational methods that prioritize candidate genes based on additional functional information such as protein-protein interactions or association with specific canonical or empirical pathways, or other attributes, can be useful. In this study we applied several supervised learning approaches to prioritize ASD or ID disease gene candidates based on curated lists of known ASD and ID disease genes. We implemented two network-based classifiers and one attribute-based classifier to show that we can rank and classify known, and predict new, genes for these neurodevelopmental disorders. We also show that ID and ASD share common pathways that perturb an overlapping synaptic regulatory subnetwork. We also show that features relating to neuronal phenotypes in mouse knockouts can help in classifying neurodevelopmental genes. Our methods can be applied broadly to other diseases helping in prioritizing newly identified genetic variation that emerge from disease gene discovery based on WES and WGS.
High-throughput sequencing; massively parallel sequencing; gene discovery; networks; pathways; neurodevelopmental disorders; classifiers; support vector machine
A number of key regulators of mouse embryonic stem (ES) cell identity, including the transcription factor Nanog, show strong expression fluctuations at the single cell level. The molecular basis for these fluctuations is unknown. Here we used a genetic complementation strategy to investigate expression changes during transient periods of Nanog downregulation. Employing an integrated approach, that includes high-throughput single cell transcriptional profiling and mathematical modelling, we found that early molecular changes subsequent to Nanog loss are stochastic and reversible. However, analysis also revealed that Nanog loss severely compromises the self-sustaining feedback structure of the ES cell regulatory network. Consequently, these nascent changes soon become consolidated to committed fate decisions in the prolonged absence of Nanog. Consistent with this, we found that exogenous regulation of Nanog-dependent feedback control mechanisms produced more a homogeneous ES cell population. Taken together our results indicate that Nanog-dependent feedback loops have a role in controlling both ES cell fate decisions and population variability.
System-wide profiling of genes and proteins in mammalian cells produce lists of differentially expressed genes/proteins that need to be further analyzed for their collective functions in order to extract new knowledge. Once unbiased lists of genes or proteins are generated from such experiments, these lists are used as input for computing enrichment with existing lists created from prior knowledge organized into gene-set libraries. While many enrichment analysis tools and gene-set libraries databases have been developed, there is still room for improvement.
Enrichr is an easy to use intuitive enrichment analysis web-based tool providing various types of visualization summaries of collective functions of gene lists. Enrichr is open source and freely available online at: http://amp.pharm.mssm.edu/Enrichr.