The unfolded protein response (UPR) maintains endoplasmic reticulum (ER) proteostasis through the activation of transcription factors such as XBP1s and ATF6. The functional consequences of these transcription factors for ER proteostasis remain poorly defined. Here, we describe methodology that enables orthogonal, small-molecule-mediated activation of the UPR-associated transcription factors XBP1s and/or ATF6 in the same cell independent of stress. We employ transcriptomics and quantitative proteomics to evaluate ER proteostasis network remodeling owing to the XBP1s and/or ATF6 transcriptional programs. Furthermore, we demonstrate that the three ER proteostasis environments accessible by activating XBP1s and/or ATF6 differentially influence the folding, trafficking, and degradation of destabilized ER client proteins without globally affecting the endogenous proteome. Our data reveal how the ER proteostasis network is remodeled by the XBP1s and/or ATF6 transcriptional programs at the molecular level and demonstrate the potential for selective restoration of aberrant ER proteostasis of pathologic, destabilized proteins through arm-selective UPR activation.
Identify novel genes and pathways specific to superficial (SZ), middle (MZ) and deep zones (DZ) of normal articular cartilage.
Articular cartilage was obtained from knees of 4 normal human donors. The cartilage zones were dissected on a microtome. RNA was analyzed on human genome arrays. Data obtained with human tissue were compared to bovine cartilage zone specific DNA arrays. Genes differentially expressed between zones were evaluated using direct annotation for structural or functional features, and by enrichment analysis for integrated pathways or functions.
The greatest differences were observed between SZ and DZ in both human and bovine cartilage. The MZ was transitional between the SZ and DZ and thereby shared some of the same pathways as well as structural/functional features of the adjacent zones. Cellular functions and biological processes enriched in the SZ relative to the DZ, include most prominently ECM receptor interactions, cell adhesion molecules, regulation of actin cytoskeleton, ribosome-related functions and signaling aspects such as Interferon gamma, IL4, CDC42Rac and Jak-Stat. Two pathways were enriched in the DZ relative to the SZ, including PPARG and EGFR/SMRTE.
These differences in cartilage zonal gene expression identify new markers and pathways that govern the unique differentiation status of chondrocyte subpopulations.
cartilage zones; gene expression
Mouse gene expression data are complex and voluminous. To maximize the utility of these data, they must be made readily accessible through databases, and those resources need to place the expression data in the larger biological context. Here we describe two community resources that approach these problems in different but complementary ways: BioGPS and the Mouse Gene Expression Database (GXD). BioGPS connects its large and homogenous microarray gene expression reference data sets via plugins with a heterogeneous collection of external gene centric resources, thus casting a wide but loose net. GXD acquires different types of expression data from many sources and integrates these data tightly with other types of data in the Mouse Genome Informatics (MGI) resource, with a strong emphasis on consistency checks and manual curation. We describe and contrast the “loose” and “tight” data integration strategies employed by BioGPS and GXD, respectively, and discuss the challenges and benefits of data integration. BioGPS is freely available at http://biogps.org. GXD is freely available through the Mouse Genome Informatics (MGI) web site (www.informatics.jax.org), or directly at www.informatics.jax.org/expression.shtml.
data integration; gene expression; database
Adipose tissue renewal and obesity-driven expansion of fat cell number are dependent on proliferation and differentiation of adipose progenitors that reside in the vasculature that develops in coordination with adipose depots. The transcriptional events that regulate commitment of progenitors to the adipose lineage are poorly understood. Because expression of the nuclear receptor PPARγ defines the adipose lineage, isolation of elements that control PPARγ expression in adipose precursors may lead to discovery of transcriptional regulators of early adipocyte determination. Here, we describe the identification and validation in transgenic mice of 5 highly conserved non-coding sequences from the PPARγ locus that can drive expression of a reporter gene in a manner that recapitulates the tissue-specific pattern of PPARγ expression. Surprisingly, these 5 elements appear to control PPARγ expression in adipocyte precursors that are associated with the vasculature of adipose depots, but not in mature adipocytes. Characterization of these five PPARγ regulatory sequences may enable isolation of the transcription factors that bind these cis elements and provide insight into the molecular regulation of adipose tissue expansion in normal and pathological states.
Structured gene annotations are a foundation upon which many bioinformatics and statistical analyses are built. However the structured annotations available in public databases are a sparse representation of biological knowledge as a whole. The rate of biomedical data generation is such that centralized biocuration efforts struggle to keep up. New models for gene annotation need to be explored that expand the pace at which we are able to structure biomedical knowledge. Recently, online games have emerged as an effective way to recruit, engage and organize large numbers of volunteers to help address difficult biological challenges. For example, games have been successfully developed for protein folding (Foldit), multiple sequence alignment (Phylo) and RNA structure design (EteRNA). Here we present Dizeez, a simple online game built with the purpose of structuring knowledge of gene-disease associations. Preliminary results from game play online and at scientific conferences suggest that Dizeez is producing valid gene-disease annotations not yet present in any public database. These early results provide a basic proof of principle that online games can be successfully applied to the challenge of gene annotation. Dizeez is available at http://genegames.org.
The Gene Ontology and its associated annotations are critical tools for interpreting lists of genes. Here, we introduce a method for evaluating the Gene Ontology annotations and structure based on the impact they have on gene set enrichment analysis, along with an example implementation. This task-based approach yields quantitative assessments grounded in experimental data and anchored tightly to the primary use of the annotations.
Applied to specific areas of biological interest, our framework allowed us to understand the progress of annotation and structural ontology changes from 2004 to 2012. Our framework was also able to determine that the quality of annotations and structure in the area under test have been improving in their ability to recall underlying biological traits. Furthermore, we were able to distinguish between the impact of changes to the annotation sets and ontology structure.
Our framework and implementation lay the groundwork for a powerful tool in evaluating the usefulness of the Gene Ontology. We demonstrate both the flexibility and the power of this approach in evaluating the current and past state of the Gene Ontology as well as its applicability in developing new methods for creating gene annotations.
The Gene Wiki is an open-access and openly editable collection of Wikipedia articles about human genes. Initiated in 2008, it has grown to include articles about more than 10 000 genes that, collectively, contain more than 1.4 million words of gene-centric text with extensive citations back to the primary scientific literature. This growing body of useful, gene-centric content is the result of the work of thousands of individuals throughout the scientific community. Here, we describe recent improvements to the automated system that keeps the structured data presented on Gene Wiki articles in sync with the data from trusted primary databases. We also describe the expanding contents, editors and users of the Gene Wiki. Finally, we introduce a new automated system, called WikiTrust, which can effectively compute the quality of Wikipedia articles, including Gene Wiki articles, at the word level. All articles in the Gene Wiki can be freely accessed and edited at Wikipedia, and additional links and information can be found at the project's Wikipedia portal page: http://en.wikipedia.org/wiki/Portal:Gene_Wiki.
The protein folding game Foldit shows that games are an effective way to recruit, engage and organize ordinary citizens to help solve difficult scientific problems.
Fast-evolving technologies have enabled researchers to easily generate data at genome scale, and using these technologies to compare biological states typically results in a list of candidate genes. Researchers are then faced with the daunting task of prioritizing these candidate genes for follow-up studies. There are hundreds, possibly even thousands, of web-based gene annotation resources available, but it quickly becomes impractical to manually access and review all of these sites for each gene in a candidate gene list. BioGPS (http://biogps.org) was created as a centralized gene portal for aggregating distributed gene annotation resources, emphasizing community extensibility and user customizability. BioGPS serves as a convenient tool for users to access known gene-centric resources, as well as a mechanism to discover new resources that were previously unknown to the user. This article describes updates to BioGPS made after its initial release in 2008. We summarize recent additions of features and data, as well as the robust user activity that underlies this community intelligence application. Finally, we describe MyGene.info (http://mygene.info) and related web services that provide programmatic access to BioGPS.
This work describes the first genome-wide analysis of the transcriptional
landscape of the pig. A new porcine Affymetrix expression array was designed in
order to provide comprehensive coverage of the known pig transcriptome. The new
array was used to generate a genome-wide expression atlas of pig tissues derived
from 62 tissue/cell types. These data were subjected to network correlation
analysis and clustering.
The analysis presented here provides a detailed functional clustering of the pig
transcriptome where transcripts are grouped according to their expression pattern,
so one can infer the function of an uncharacterized gene from the company it keeps
and the locations in which it is expressed. We describe the overall
transcriptional signatures present in the tissue atlas, where possible assigning
those signatures to specific cell populations or pathways. In particular, we
discuss the expression signatures associated with the gastrointestinal tract, an
organ that was sampled at 15 sites along its length and whose biology in the pig
is similar to human. We identify sets of genes that define specialized cellular
compartments and region-specific digestive functions. Finally, we performed a
network analysis of the transcription factors expressed in the gastrointestinal
tract and demonstrate how they sub-divide into functional groups that may control
cellular gastrointestinal development.
As an important livestock animal with a physiology that is more similar than mouse
to man, we provide a major new resource for understanding gene expression with
respect to the known physiology of mammalian tissues and cells. The data and
analyses are available on the websites http://biogps.org and
pig; porcine; Sus scrofa; microarray; transcriptome; transcription network; pathway; gastrointestinal tract
A variety of topic-focused wikis are used in the biomedical sciences to enable the mass-collaborative synthesis and distribution of diverse bodies of knowledge. To address complex problems such as defining the relationships between genes and disease, it is important to bring the knowledge from many different domains together. Here we show how advances in wiki technology and natural language processing can be used to automatically assemble ‘meta-wikis’ that present integrated views over the data collaboratively created in multiple source wikis.
We produced a semantic meta-wiki called the Gene Wiki+ that automatically mirrors and integrates data from the Gene Wiki and SNPedia. The Gene Wiki+, available at (http://genewikiplus.org/), captures 8,047 distinct gene-disease relationships. SNPedia accounts for 4,149 of the gene-disease pairs, the Gene Wiki provides 4,377 and only 479 appear independently in both sources. All of this content is available to query and browse and is provided as linked open data.
Wikis contain increasing amounts of diverse, biological information useful for elucidating the connections between genes and disease. The Gene Wiki+ shows how wiki technology can be used in concert with natural language processing to provide integrated views over diverse underlying data sources.
Wikipedia is increasingly used as a platform for collaborative data curation, but its current technical implementation has significant limitations that hinder its use in biocuration applications. Specifically, while editors can easily link between two articles in Wikipedia to indicate a relationship, there is no way to indicate the nature of that relationship in a way that is computationally accessible to the system or to external developers. For example, in addition to noting a relationship between a gene and a disease, it would be useful to differentiate the cases where genetic mutation or altered expression causes the disease. Here, we introduce a straightforward method that allows Wikipedia editors to embed computable semantic relations directly in the context of current Wikipedia articles. In addition, we demonstrate two novel applications enabled by the presence of these new relationships. The first is a dynamically generated information box that can be rendered on all semantically enhanced Wikipedia articles. The second is a prototype gene annotation system that draws its content from the gene-centric articles on Wikipedia and exposes the new semantic relationships to enable previously impossible, user-defined queries.
Ontology-based gene annotations are important tools for organizing and analyzing genome-scale biological data. Collecting these annotations is a valuable but costly endeavor. The Gene Wiki makes use of Wikipedia as a low-cost, mass-collaborative platform for assembling text-based gene annotations. The Gene Wiki is comprised of more than 10,000 review articles, each describing one human gene. The goal of this study is to define and assess a computational strategy for translating the text of Gene Wiki articles into ontology-based gene annotations. We specifically explore the generation of structured annotations using the Gene Ontology and the Human Disease Ontology.
Our system produced 2,983 candidate gene annotations using the Disease Ontology and 11,022 candidate annotations using the Gene Ontology from the text of the Gene Wiki. Based on manual evaluations and comparisons to reference annotation sets, we estimate a precision of 90-93% for the Disease Ontology annotations and 48-64% for the Gene Ontology annotations. We further demonstrate that this data set can systematically improve the results from gene set enrichment analyses.
The Gene Wiki is a rapidly growing corpus of text focused on human gene function. Here, we demonstrate that the Gene Wiki can be a powerful resource for generating ontology-based gene annotations. These annotations can be used immediately to improve workflows for building curated gene annotation databases and knowledge-based statistical analyses.
We analyzed the gene expression patterns of 138 Non-Small Cell Lung Cancer (NSCLC) samples and developed a new algorithm called Coverage Analysis with Fisher’s Exact Test (CAFET) to identify molecular pathways that are differentially activated in squamous cell carcinoma (SCC) and adenocarcinoma (AC) subtypes. Analysis of the lung cancer samples demonstrated hierarchical clustering according to the histological subtype and revealed a strong enrichment for the Wnt signaling pathway components in the cluster consisting predominantly of SCC samples. The specific gene expression pattern observed correlated with enhanced activation of the Wnt Planar Cell Polarity (PCP) pathway and inhibition of the canonical Wnt signaling branch. Further real time RT-PCR follow-up with additional primary tumor samples and lung cancer cell lines confirmed enrichment of Wnt/PCP pathway associated genes in the SCC subtype. Dysregulation of the canonical Wnt pathway, characterized by increased levels of β-catenin and epigenetic silencing of negative regulators, has been reported in adenocarcinoma of the lung. Our results suggest that SCC and AC utilize different branches of the Wnt pathway during oncogenesis.
Animal models of human behavioral endophenotypes, such as the Tail Suspension Test (TST) and the Open Field assay (OF), have proven to be essential tools in revealing the genetics and mechanisms of psychiatric diseases. As in the human disorders they model, the measurements generated in these behavioral assays are significantly impacted by the genetic background of the animals tested. In order to better understand the strain-dependent phenotypic variability endemic to this type of work, and better inform future studies that rely on the data generated by these models, we phenotyped 33 inbred mouse strains for immobility in the TST, a mouse model of behavioral despair, and for activity in the OF, a model of general anxiety and locomotor activity.
We identified significant strain-dependent differences in TST immobility, and in thigmotaxis and distance traveled in the OF. These results were replicable over multiple testing sessions and exhibited high heritability. We exploited the heritability of these behavioral traits by using in silico haplotype-based association mapping to identify candidate genes for regulating TST behavior. Two significant loci (-logp >7.0, gFWER adjusted p value <0.05) of approximately 300 kb each on MMU9 and MMU10 were identified. The MMU10 locus is syntenic to a major human depressive disorder QTL on human chromosome 12 and contains several genes that are expressed in brain regions associated with behavioral despair.
We report the results of phenotyping a large panel of inbred mouse strains for depression and anxiety-associated behaviors. These results show significant, heritable strain-specific differences in behavior, and should prove to be a valuable resource for the behavioral and genetics communities. Additionally, we used haplotype mapping to identify several loci that may contain genes that regulate behavioral despair.
The study of expression quantitative trait loci (eQTL) is a powerful way of detecting transcriptional regulators at a genomic scale and for elucidating how natural genetic variation impacts gene expression. Power and genetic resolution are heavily affected by the study population: whereas recombinant inbred (RI) strains yield greater statistical power with low genetic resolution, using diverse inbred or outbred strains improves genetic resolution at the cost of lower power. In order to overcome the limitations of both individual approaches, we combine data from RI strains with genetically more diverse strains and analyze hippocampus eQTL data obtained from mouse RI strains (BXD) and from a panel of diverse inbred strains (Mouse Diversity Panel, MDP). We perform a systematic analysis of the consistency of eQTL independently obtained from these two populations and demonstrate that a significant fraction of eQTL can be replicated. Based on existing knowledge from pathway databases we assess different approaches for using the high-resolution MDP data for fine mapping BXD eQTL. Finally, we apply this framework to an eQTL hotspot on chromosome 1 (Qrr1), which has been implicated in a range of neurological traits. Here we present the first systematic examination of the consistency between eQTL obtained independently from the BXD and MDP populations. Our analysis of fine-mapping approaches is based on ‘real life’ data as opposed to simulated data and it allows us to propose a strategy for using MDP data to fine map BXD eQTL. Application of this framework to Qrr1 reveals that this eQTL hotspot is not caused by just one (or few) ‘master regulators’, but actually by a set of polymorphic genes specific to the central nervous system.
Two decades of research identified more than a dozen clock genes and defined a biochemical feedback mechanism of circadian oscillator function. To identify additional clock genes and modifiers, we conducted a genome-wide siRNA screen in a human cellular clock model. Knockdown of nearly a thousand genes reduced rhythm amplitude. Potent effects on period length or increased amplitude were less frequent; we found hundreds of these and confirmed them in secondary screens. Characterization of a subset of these genes demonstrated a dosage-dependent effect on oscillator function. Protein interaction network analysis showed that dozens of gene products directly or indirectly associate with known clock components. Pathway analysis revealed these genes are overrepresented for components of insulin and hedgehog signaling, the cell cycle, and the folate metabolism. Coupled with data showing many of these pathways are clock-regulated, we conclude the clock is interconnected with many aspects of cellular function.
In Huntington's disease (HD), an expanded CAG repeat produces characteristic striatal neurodegeneration. Interestingly, the HD CAG repeat, whose length determines age at onset, undergoes tissue-specific somatic instability, predominant in the striatum, suggesting that tissue-specific CAG length changes could modify the disease process. Therefore, understanding the mechanisms underlying the tissue specificity of somatic instability may provide novel routes to therapies. However progress in this area has been hampered by the lack of sensitive high-throughput instability quantification methods and global approaches to identify the underlying factors.
Here we describe a novel approach to gain insight into the factors responsible for the tissue specificity of somatic instability. Using accurate genetic knock-in mouse models of HD, we developed a reliable, high-throughput method to quantify tissue HD CAG repeat instability and integrated this with genome-wide bioinformatic approaches. Using tissue instability quantified in 16 tissues as a phenotype and tissue microarray gene expression as a predictor, we built a mathematical model and identified a gene expression signature that accurately predicted tissue instability. Using the predictive ability of this signature we found that somatic instability was not a consequence of pathogenesis. In support of this, genetic crosses with models of accelerated neuropathology failed to induce somatic instability. In addition, we searched for genes and pathways that correlated with tissue instability. We found that expression levels of DNA repair genes did not explain the tissue specificity of somatic instability. Instead, our data implicate other pathways, particularly cell cycle, metabolism and neurotransmitter pathways, acting in combination to generate tissue-specific patterns of instability.
Our study clearly demonstrates that multiple tissue factors reflect the level of somatic instability in different tissues. In addition, our quantitative, genome-wide approach is readily applicable to high-throughput assays and opens the door to widespread applications with the potential to accelerate the discovery of drugs that alter tissue instability.
BioGPS is a community based customisable gene annotation portal bringing together gene annotation resources on to a single platform.
Online gene annotation resources are indispensable for analysis of genomics data. However, the landscape of these online resources is highly fragmented, and scientists often visit dozens of these sites for each gene in a candidate gene list. Here, we introduce BioGPS http://biogps.gnf.org, a centralized gene portal for aggregating distributed gene annotation resources. Moreover, BioGPS embraces the principle of community intelligence, enabling any user to easily and directly contribute to the BioGPS platform.
Annotating the function of all human genes is a critical, yet formidable, challenge. Current gene annotation efforts focus on centralized curation resources, but it is increasingly clear that this approach does not scale with the rapid growth of the biomedical literature. The Gene Wiki utilizes an alternative and complementary model based on the principle of community intelligence. Directly integrated within the online encyclopedia, Wikipedia, the goal of this effort is to build a gene-specific review article for every gene in the human genome, where each article is collaboratively written, continuously updated and community reviewed. Previously, we described the creation of Gene Wiki ‘stubs’ for approximately 9000 human genes. Here, we describe ongoing systematic improvements to these articles to increase their utility. Moreover, we retrospectively examine the community usage and improvement of the Gene Wiki, providing evidence of a critical mass of users and editors. Gene Wiki articles are freely accessible within the Wikipedia web site, and additional links and information are available at http://en.wikipedia.org/wiki/Portal:Gene_Wiki.
Glyoxalase 1 (Glo1) has been implicated in anxiety-like behavior in mice and in multiple psychiatric diseases in humans. We used mouse Affymetrix exon arrays to detect copy number variants (CNV) among inbred mouse strains and thereby identified a ∼475 kb tandem duplication on chromosome 17 that includes Glo1 (30,174,390–30,651,226 Mb; mouse genome build 36). We developed a PCR-based strategy and used it to detect this duplication in 23 of 71 inbred strains tested, and in various outbred and wild-caught mice. Presence of the duplication is associated with a cis-acting expression QTL for Glo1 (LOD>30) in BXD recombinant inbred strains. However, evidence for an eQTL for Glo1 was not obtained when we analyzed single SNPs or 3-SNP haplotypes in a panel of 27 inbred strains. We conclude that association analysis in the inbred strain panel failed to detect an eQTL because the duplication was present on multiple highly divergent haplotypes. Furthermore, we suggest that non-allelic homologous recombination has led to multiple reversions to the non-duplicated state among inbred strains. We show associations between multiple duplication-containing haplotypes, Glo1 expression and anxiety-like behavior in both inbred strain panels and outbred CD-1 mice. Our findings provide a molecular basis for differential expression of Glo1 and further implicate Glo1 in anxiety-like behavior. More broadly, these results identify problems with commonly employed tests for association in inbred strains when CNVs are present. Finally, these data provide an example of biologically significant phenotypic variability in model organisms that can be attributed to CNVs.
Hepatitis C virus (HCV) infection is a global health problem. A number of studies have implicated a direct role of cellular lipid metabolism in the HCV life cycle and inhibitors of the mevalonate pathway have been demonstrated to result in an antiviral state within the host cell. Transcriptome profiling was conducted on Huh-7 human hepatoma cells bearing subgenomic HCV replicons with and without treatment with 25-hydroxycholesterol (25-HC), an inhibitor of the mevalonate pathway that alters lipid metabolism, to assess metabolic determinants of pro- and antiviral states within the host cell. These data were compared with gene expression profiles from HCV-infected chimpanzees.
Transcriptome profiling of Huh-7 cells treated with 25-HC gave 47 downregulated genes, 16 of which are clearly related to the mevalonate pathway. Fewer genes were observed to be upregulated (22) in the presence of 25-HC and 5 genes were uniquely upregulated in the HCV replicon bearing cells. Comparison of these gene expression profiles with data collected during the initial rise in viremia in 4 previously characterized HCV-infected chimpanzees yielded 54 overlapping genes, 4 of which showed interesting differential regulation at the mRNA level in both systems. These genes are PROX1, INSIG-1, NK4, and UBD. The expression of these genes was perturbed with siRNAs and with overexpression vectors in HCV replicon cells, and the effect on HCV replication and translation was assessed. Both PROX1 and NK4 regulated HCV replication in conjunction with an antiviral state induced by 25-hydroxycholesterol.
Treatment of Huh-7 cells bearing HCV replicons with 25-HC leads to the downregulation of many key genes involved in the mevalonate pathway leading to an antiviral state within the host cell. Furthermore, dysregulation of a larger subset of genes not directly related to the mevalonate pathway occurs both in 25-HC-treated HCV replicon harbouring cells as well as during the initial rise in viremia in infected chimpanzees. Functional studies of 3 of these genes demonstrates that they do not directly act as antiviral gene products but that they indirectly contribute to the antiviral state in the host cell. These genes may also represent novel biomarkers for HCV infection, since they demonstrate an outcome-specific expression profile.