Zhao, Hong | Lin, Wenyu | Kumthip, Kattareeya | Cheng, Du | Fusco, Dahlene N | Hofmann, Oliver | Jilg, Nikolaus | Tai, Andrew W. | Goto, Kaku | Zhang, Leiliang | Hide, Winston | Jang, Jae Young | Peng, Lee F | Chung, Raymond T
Background & Aims
The precise mechanisms by which IFN exerts its antiviral effect against HCV have not yet been elucidated. We sought to identify host genes that mediate the antiviral effect of IFN-α by conducting a whole-genome siRNA library screen.
Methods
High throughput screening was performed using an HCV genotype 1b replicon, pRep-Feo. Those pools with replicate robust Z scores ≥ 2.0 entered secondary validation in full-length OR6 replicon cells. Huh7.5.1 cells infected with JFH1 were then used to validate the rescue efficacy of selected genes for HCV replication under IFN-α treatment.
Results
We identified and confirmed 93 human genes involved in the IFN-α anti-HCV effect using a whole-genome siRNA library. Gene ontology analysis revealed that mRNA processing (23 genes, P=2.756e-22), translation initiation (9 genes, P=2.42e-6), and IFN signaling (5 genes, P=1.00e-3) were the most enriched functional groups. Nine genes were components of U4/U6.U5 tri-snRNP. We confirmed that silencing squamous cell carcinoma antigen recognized by T cells (SART1), a specific factor of tri-snRNP, abrogates IFN-α's suppressive effects against HCV in both replicon cells and JFH1 infectious cells. We further found that SART1 was not an IFN-α inducible, and its anti-HCV effector in the JFH1 infectious model was through regulation of interferon stimulated genes (ISGs) with or without IFN-α.
Conclusions
We identified 93 genes that mediate the anti-HCV effect of IFN-α through genome-wide siRNA screening; 23 and 9 genes were involved in mRNA processing and translation initiation, respectively. These findings reveal an unexpected role for mRNA processing in generation of the antiviral state, and suggest a new avenue for therapeutic development in HCV.
doi:10.1016/j.jhep.2011.07.026
PMCID: PMC3261326
PMID: 21888876
Hepatitis C Virus, HCV; Interferon-α, IFN-α; Small interfering RNA, siRNA; Squamous cell carcinoma Antigen Recognized by T cells, SART1; U4/U6.U5 tri-small nuclear ribonucleoproteins, U4/U6.U5 tri-snRNP
Graphical abstract
Highlights
► We describe monolayers consisting of molecules with polar repeat units. ► The global energy gap in such layers can vanish. ► In such films the dipole moment per molecule saturates beyond a certain length. ► These effects result from electrostatic dimensionality effects. ► This represents an “organic electronics” analogue to inorganic “polar surfaces”.
In conjugated organic molecules, excitation gaps typically decrease reciprocally with increasing the number of repeat units, n. This usually holds for individual molecules as well as for the corresponding bulk materials. Here, we show using density-functional theory calculations that a qualitatively different evolution is found for layers built from molecules consisting of polar repeat units. Whereas a 1/n-dependence is still observed in the case of isolated polar molecules, the global gap decreases essentially linearly with n in the corresponding 2D-periodic systems and vanishes beyond a certain molecular length, with the frontier states being localized at opposite ends of the layer. The latter is accompanied by a saturation of the dipole moment per molecule, an effect not observed in the isolated polar molecules. Interestingly, in both cases the limit of the gap for long (but finite) molecules differs qualitatively from that of infinite length obtained in 1D-periodic and 3D-periodic calculations, the latter serving as models for polymers and the bulk. We rationalize these dimensionality effects as a consequence of the potential gradient within the finite-length layers. They arise from the collective action of intra-molecular dipoles in the 2D periodic layers and can be traced back to surface effects.
doi:10.1016/j.orgel.2012.09.003
PMCID: PMC3587343
PMID: 23470879
Energy gap; Polar surface; Dimensionality effects; Band-structure calculations; Polar molecules; Thin-film electrostatics
Sansone, Susanna-Assunta | Rocca-Serra, Philippe | Field, Dawn | Maguire, Eamonn | Taylor, Chris | Hofmann, Oliver | Fang, Hong | Neumann, Steffen | Tong, Weida | Amaral-Zettler, Linda | Begley, Kimberly | Booth, Tim | Bougueleret, Lydie | Burns, Gully | Chapman, Brad | Clark, Tim | Coleman, Lee-Ann | Copeland, Jay | Das, Sudeshna | de Daruvar, Antoine | de Matos, Paula | Dix, Ian | Edmunds, Scott | Evelo, Chris T | Forster, Mark J | Gaudet, Pascale | Gilbert, Jack | Goble, Carole | Griffin, Julian L | Jacob, Daniel | Kleinjans, Jos | Harland, Lee | Haug, Kenneth | Hermjakob, Henning | Ho Sui, Shannan J | Laederach, Alain | Liang, Shaoguang | Marshall, Stephen | McGrath, Annette | Merrill, Emily | Reilly, Dorothy | Roux, Magali | Shamu, Caroline E | Shang, Catherine A | Steinbeck, Christoph | Trefethen, Anne | Williams-Jones, Bryn | Wolstencroft, Katherine | Xenarios, Ioannis | Hide, Winston
To make full use of research data, the bioscience community needs to adopt technologies and reward mechanisms that support interoperability and promote the growth of an open ‘data commoning’ culture. Here we describe the prerequisites for data commoning and present an established and growing ecosystem of solutions using the shared ‘Investigation-Study-Assay’ framework to support that vision.
doi:10.1038/ng.1054
PMCID: PMC3428019
PMID: 22281772
Background: Animal studies suggest that early-life lead exposure influences gene expression and production of proteins associated with Alzheimer’s disease (AD).
Objectives: We attempted to assess the relationship between early-life lead exposure and potential biomarkers for AD among young men and women. We also attempted to assess whether early-life lead exposure was associated with changes in expression of AD-related genes.
Methods: We used sandwich enzyme-linked immunosorbent assays (ELISA) to measure plasma concentrations of amyloid β proteins Aβ40 and Aβ42 among 55 adults who had participated as newborns and young children in a prospective cohort study of the effects of lead exposure on development. We used RNA microarray techniques to analyze gene expression.
Results: Mean plasma Aβ42 concentrations were lower among 13 participants with high umbilical cord blood lead concentrations (≥ 10 μg/dL) than in 42 participants with lower cord blood lead concentrations (p = 0.08). Among 10 participants with high prenatal lead exposure, we found evidence of an inverse relationship between umbilical cord lead concentration and expression of ADAM metallopeptidase domain 9 (ADAM9), reticulon 4 (RTN4), and low-density lipoprotein receptor-related protein associated protein 1 (LRPAP1) genes, whose products are believed to affect Aβ production and deposition. Gene network analysis suggested enrichment in gene sets involved in nerve growth and general cell development.
Conclusions: Data from our exploratory study suggest that prenatal lead exposure may influence Aβ-related biological pathways that have been implicated in AD onset. Gene network analysis identified further candidates to study the mechanisms of developmental lead neurotoxicity.
doi:10.1289/ehp.1104474
PMCID: PMC3346789
PMID: 22313790
Alzheimer’s disease; children; fetal basis of adult disease; human; lead
Gene expression quantitative trait loci (eQTL) are useful for identifying single nucleotide polymorphisms (SNPs) associated with diseases. At times, a genetic variant may be associated with a master regulator involved in the manifestation of a disease. The downstream target genes of the master regulator are typically co-expressed and share biological function. Therefore, it is practical to screen for eQTLs by identifying SNPs associated with the targets of a transcript-regulator (TR). We used a multivariate regression with the gene expression of known targets of TRs and SNPs to identify TReQTLs in European (CEU) and African (YRI) HapMap populations. A nominal p-value of <1×10−6 revealed 234 SNPs in CEU and 154 in YRI as TReQTLs. These represent 36 independent (tag) SNPs in CEU and 39 in YRI affecting the downstream targets of 25 and 36 TRs respectively. At a false discovery rate (FDR) = 45%, one cis-acting tag SNP (within 1 kb of a gene) in each population was identified as a TReQTL. In CEU, the SNP (rs16858621) in Pcnxl2 was found to be associated with the genes regulated by CREM whereas in YRI, the SNP (rs16909324) was linked to the targets of miRNA hsa-miR-125a. To infer the pathways that regulate expression, we ranked TReQTLs by connectivity within the structure of biological process subtrees. One TReQTL SNP (rs3790904) in CEU maps to Lphn2 and is associated (nominal p-value = 8.1×10−7) with the targets of the X-linked breast cancer suppressor Foxp3. The structure of the biological process subtree and a gene interaction network of the TReQTL revealed that tumor necrosis factor, NF-kappaB and variants in G-protein coupled receptors signaling may play a central role as communicators in Foxp3 functional regulation. The potential pleiotropic effect of the Foxp3 TReQTLs was gleaned from integrating mRNA-Seq data and SNP-set enrichment into the analysis.
doi:10.1371/journal.pone.0034286
PMCID: PMC3313997
PMID: 22479588
Fu, Suneng | Yang, Ling | Li, Ping | Hofmann, Oliver | Dicker, Lee | Hide, Winston | Lin, Xihong | Watkins, Steven M. | Ivanov, Alexander | Hotamisligil, Gökhan S.
Nature
2011;473(7348):528-531.
doi:10.1038/nature09968
PMCID: PMC3102791
PMID: 21532591
Ho Sui, Shannan J. | Begley, Kimberly | Reilly, Dorothy | Chapman, Brad | McGovern, Ray | Rocca-Sera, Philippe | Maguire, Eamonn | Altschuler, Gabriel M. | Hansen, Terah A. A. | Sompallae, Ramakrishna | Krivtsov, Andrei | Shivdasani, Ramesh A. | Armstrong, Scott A. | Culhane, Aedín C. | Correll, Mick | Sansone, Susanna-Assunta | Hofmann, Oliver | Hide, Winston
Mounting evidence suggests that malignant tumors are initiated and maintained by a subpopulation of cancerous cells with biological properties similar to those of normal stem cells. However, descriptions of stem-like gene and pathway signatures in cancers are inconsistent across experimental systems. Driven by a need to improve our understanding of molecular processes that are common and unique across cancer stem cells (CSCs), we have developed the Stem Cell Discovery Engine (SCDE)—an online database of curated CSC experiments coupled to the Galaxy analytical framework. The SCDE allows users to consistently describe, share and compare CSC data at the gene and pathway level. Our initial focus has been on carefully curating tissue and cancer stem cell-related experiments from blood, intestine and brain to create a high quality resource containing 53 public studies and 1098 assays. The experimental information is captured and stored in the multi-omics Investigation/Study/Assay (ISA-Tab) format and can be queried in the data repository. A linked Galaxy framework provides a comprehensive, flexible environment populated with novel tools for gene list comparisons against molecular signatures in GeneSigDB and MSigDB, curated experiments in the SCDE and pathways in WikiPathways. The SCDE is available at http://discovery.hsci.harvard.edu.
doi:10.1093/nar/gkr1051
PMCID: PMC3245064
PMID: 22121217
Lal, Ashish | Thomas, Marshall P. | Altschuler, Gabriel | Navarro, Francisco | O'Day, Elizabeth | Li, Xiao Ling | Concepcion, Carla | Han, Yoon-Chi | Thiery, Jerome | Rajani, Danielle K. | Deutsch, Aaron | Hofmann, Oliver | Ventura, Andrea | Hide, Winston | Lieberman, Judy | McManus, Michael T.
A simple biochemical method to isolate mRNAs pulled down with a transfected, biotinylated microRNA was used to identify direct target genes of miR-34a, a tumor suppressor gene. The method reidentified most of the known miR-34a regulated genes expressed in K562 and HCT116 cancer cell lines. Transcripts for 982 genes were enriched in the pull-down with miR-34a in both cell lines. Despite this large number, validation experiments suggested that ∼90% of the genes identified in both cell lines can be directly regulated by miR-34a. Thus miR-34a is capable of regulating hundreds of genes. The transcripts pulled down with miR-34a were highly enriched for their roles in growth factor signaling and cell cycle progression. These genes form a dense network of interacting gene products that regulate multiple signal transduction pathways that orchestrate the proliferative response to external growth stimuli. Multiple candidate miR-34a–regulated genes participate in RAS-RAF-MAPK signaling. Ectopic miR-34a expression reduced basal ERK and AKT phosphorylation and enhanced sensitivity to serum growth factor withdrawal, while cells genetically deficient in miR-34a were less sensitive. Fourteen new direct targets of miR-34a were experimentally validated, including genes that participate in growth factor signaling (ARAF and PIK3R2) as well as genes that regulate cell cycle progression at various phases of the cell cycle (cyclins D3 and G2, MCM2 and MCM5, PLK1 and SMAD4). Thus miR-34a tempers the proliferative and pro-survival effect of growth factor stimulation by interfering with growth factor signal transduction and downstream pathways required for cell division.
Author Summary
microRNAs (miRNAs) are small RNAs that regulate gene expression by binding to mRNAs bearing a partially complementary sequence. miRNAs decrease the stability or translation of mRNA targets, leading to reduced protein expression. Understanding the biological function of a miRNA requires identifying its targets. Here we developed a sensitive and specific biochemical method to identify candidate microRNA targets that are enriched by pull-down with a tagged, transfected microRNA mimic. The method was applied to miR-34a, a miRNA that inhibits cell proliferation. We found that miR-34a can potentially regulate hundreds of genes. Computational analysis of these genes suggested a novel function for miR-34a—suppression of the pro-proliferative response to diverse growth factors. This function complements the previously known role of miR-34a in blocking cell cycle progression. Thus, by reducing the expression of an extensive network of genes, miR-34a dampens growth factor signaling as well as its downstream consequences, promotion of cell survival and proliferation.
doi:10.1371/journal.pgen.1002363
PMCID: PMC3213160
PMID: 22102825
Voight, Benjamin F | Scott, Laura J | Steinthorsdottir, Valgerdur | Morris, Andrew P | Dina, Christian | Welch, Ryan P | Zeggini, Eleftheria | Huth, Cornelia | Aulchenko, Yurii S | Thorleifsson, Gudmar | McCulloch, Laura J | Ferreira, Teresa | Grallert, Harald | Amin, Najaf | Wu, Guanming | Willer, Cristen J | Raychaudhuri, Soumya | McCarroll, Steve A | Langenberg, Claudia | Hofmann, Oliver M | Dupuis, Josée | Qi, Lu | Segrè, Ayellet V | van Hoek, Mandy | Navarro, Pau | Ardlie, Kristin | Balkau, Beverley | Benediktsson, Rafn | Bennett, Amanda J | Blagieva, Roza | Boerwinkle, Eric | Bonnycastle, Lori L | Boström, Kristina Bengtsson | Bravenboer, Bert | Bumpstead, Suzannah | Burtt, Noisël P | Charpentier, Guillaume | Chines, Peter S | Cornelis, Marilyn | Couper, David J | Crawford, Gabe | Doney, Alex S F | Elliott, Katherine S | Elliott, Amanda L | Erdos, Michael R | Fox, Caroline S | Franklin, Christopher S | Ganser, Martha | Gieger, Christian | Grarup, Niels | Green, Todd | Griffin, Simon | Groves, Christopher J | Guiducci, Candace | Hadjadj, Samy | Hassanali, Neelam | Herder, Christian | Isomaa, Bo | Jackson, Anne U | Johnson, Paul R V | Jørgensen, Torben | Kao, Wen H L | Klopp, Norman | Kong, Augustine | Kraft, Peter | Kuusisto, Johanna | Lauritzen, Torsten | Li, Man | Lieverse, Aloysius | Lindgren, Cecilia M | Lyssenko, Valeriya | Marre, Michel | Meitinger, Thomas | Midthjell, Kristian | Morken, Mario A | Narisu, Narisu | Nilsson, Peter | Owen, Katharine R | Payne, Felicity | Perry, John R B | Petersen, Ann-Kristin | Platou, Carl | Proença, Christine | Prokopenko, Inga | Rathmann, Wolfgang | Rayner, N William | Robertson, Neil R | Rocheleau, Ghislain | Roden, Michael | Sampson, Michael J | Saxena, Richa | Shields, Beverley M | Shrader, Peter | Sigurdsson, Gunnar | Sparsø, Thomas | Strassburger, Klaus | Stringham, Heather M | Sun, Qi | Swift, Amy J | Thorand, Barbara | Tichet, Jean | Tuomi, Tiinamaija | van Dam, Rob M | van Haeften, Timon W | van Herpt, Thijs | van Vliet-Ostaptchouk, Jana V | Walters, G Bragi | Weedon, Michael N | Wijmenga, Cisca | Witteman, Jacqueline | Bergman, Richard N | Cauchi, Stephane | Collins, Francis S | Gloyn, Anna L | Gyllensten, Ulf | Hansen, Torben | Hide, Winston A | Hitman, Graham A | Hofman, Albert | Hunter, David J | Hveem, Kristian | Laakso, Markku | Mohlke, Karen L | Morris, Andrew D | Palmer, Colin N A | Pramstaller, Peter P | Rudan, Igor | Sijbrands, Eric | Stein, Lincoln D | Tuomilehto, Jaakko | Uitterlinden, Andre | Walker, Mark | Wareham, Nicholas J | Watanabe, Richard M | Abecasis, Gonçalo R | Boehm, Bernhard O | Campbell, Harry | Daly, Mark J | Hattersley, Andrew T | Hu, Frank B | Meigs, James B | Pankow, James S | Pedersen, Oluf | Wichmann, H-Erich | Barroso, Inês | Florez, Jose C | Frayling, Timothy M | Groop, Leif | Sladek, Rob | Thorsteinsdottir, Unnur | Wilson, James F | Illig, Thomas | Froguel, Philippe | van Duijn, Cornelia M | Stefansson, Kari | Altshuler, David | Boehnke, Michael | McCarthy, Mark I
By combining genome-wide association data from 8,130 individuals with type 2 diabetes (T2D) and 38,987 controls of European descent and following up previously unidentified meta-analysis signals in a further 34,412 cases and 59,925 controls, we identified 12 new T2D association signals with combinedP < 5 × 10−8. These include a second independent signal at the KCNQ1 locus; the first report, to our knowledge, of an X-chromosomal association (near DUSP9); and a further instance of overlap between loci implicated in monogenic and multifactorial forms of diabetes (at HNF1A). The identified loci affect both beta-cell function and insulin action, and, overall, T2D association signals show evidence of enrichment for genes involved in cell cycle regulation. We also show that a high proportion of T2D susceptibility loci harbor independent association signals influencing apparently unrelated complex traits.
doi:10.1038/ng.609
PMCID: PMC3080658
PMID: 20581827
Background
As playing important roles in gene regulation, microRNAs (miRNAs) are
believed as indispensable involvers in the pathogenesis of myocardial
infarction (MI) that causes significant morbidity and mortality. Working on
a hypothesis that modulation of only some key members in the miRNA
superfamily could benefit ischemic heart, we proposed a microarray based
network biology approach to identify them with the recognized clinical
effect of propranolol as a prompt.
Methods
A long-term MI model of rat was established in this study. The microarray
technology was applied to determine the global miRNA expression change
intervened by propranolol. Multiple network analyses were sequentially
applied to evaluate the regulatory capacity, efficiency and emphasis of the
miRNAs which dysexpression in MI were significantly reversed by
propranolol.
Results
Microarray data analysis indicated that long-term propranolol administration
caused 18 of the 31 dysregulated miRNAs in MI undergoing reversed
expression, implying that intentional modulation of miRNA expression might
show favorable effects for ischemic heart. Our network analysis identified
that, among these miRNAs, the prime players in MI were miR-1, miR-29b and
miR-98. Further finding revealed that miR-1 focused on regulation of myocyte
growth, yet miR-29b and miR-98 stressed on fibrosis and inflammation,
respectively.
Conclusion
Our study illustrates how a combination of microarray technology and
functional protein network analysis can be used to identify disease-related
key miRNAs.
doi:10.1371/journal.pone.0014736
PMCID: PMC3046111
PMID: 21386882
Rocca-Serra, Philippe | Brandizi, Marco | Maguire, Eamonn | Sklyar, Nataliya | Taylor, Chris | Begley, Kimberly | Field, Dawn | Harris, Stephen | Hide, Winston | Hofmann, Oliver | Neumann, Steffen | Sterk, Peter | Tong, Weida | Sansone, Susanna-Assunta
Summary: The first open source software suite for experimentalists and curators that (i) assists in the annotation and local management of experimental metadata from high-throughput studies employing one or a combination of omics and other technologies; (ii) empowers users to uptake community-defined checklists and ontologies; and (iii) facilitates submission to international public repositories.
Availability and Implementation: Software, documentation, case studies and implementations at http://www.isa-tools.org
Contact: isatools@googlegroups.com
doi:10.1093/bioinformatics/btq415
PMCID: PMC2935443
PMID: 20679334
doi:10.1371/journal.pcbi.1000779
PMCID: PMC2877728
PMID: 20523745
doi:10.1371/journal.pcbi.1000563
PMCID: PMC2813255
PMID: 20126525
doi:10.1371/journal.pcbi.1000640
PMCID: PMC2813254
PMID: 20126524
Computational modeling is used to describe the mechanisms governing energy level alignment between an organic semiconductor (OSC) and a metal covered by various self-assembled monolayers (SAMs). In particular, we address the question to what extent and under what circumstances SAM-induced work-function modifications lead to an actual change of the barriers for electron and hole injection from the metal into the OSC layer. Depending on the nature of the SAM, we observe clear transitions between Fermi level pinning and vacuum-level alignment regimes. Surprisingly, although in most cases the pinning occurs only when the metal is present, it is not related to charge transfer between the electrode and the organic layer. Instead, charge rearrangements at the interface between the SAM and the OSC are observed, accompanied by a polarization of the SAM.
doi:10.1021/nn9010494
PMCID: PMC2782352
PMID: 19891441
computational modeling; density functional theory; electronic structure/processes/mechanisms; monolayers; organic electronics; molecular electronics; self-assembly; metal/organic interfaces
Background
With the arrival of the postgenomic era, there is increasing interest in the discovery of biomarkers for the accurate diagnosis, prognosis, and early detection of cancer. Blood-borne cancer markers are favored by clinicians, because blood samples can be obtained and analyzed with relative ease. We have used a combined mining strategy based on an integrated cancer microarray platform, Oncomine, and the biomarker module of the Ingenuity Pathways Analysis (IPA) program to identify potential blood-based markers for six common human cancer types.
Methodology/Principal Findings
In the Oncomine platform, the genes overexpressed in cancer tissues relative to their corresponding normal tissues were filtered by Gene Ontology keywords, with the extracellular environment stipulated and a corrected Q value (false discovery rate) cut-off implemented. The identified genes were imported to the IPA biomarker module to separate out those genes encoding putative secreted or cell-surface proteins as blood-borne (blood/serum/plasma) cancer markers. The filtered potential indicators were ranked and prioritized according to normalized absolute Student t values. The retrieval of numerous marker genes that are already clinically useful or under active investigation confirmed the effectiveness of our mining strategy. To identify the biomarkers that are unique for each cancer type, the upregulated marker genes that are in common between each two tumor types across the six human tumors were also analyzed by the IPA biomarker comparison function.
Conclusion/Significance
The upregulated marker genes shared among the six cancer types may serve as a molecular tool to complement histopathologic examination, and the combination of the commonly upregulated and unique biomarkers may serve as differentiating markers for a specific cancer. This approach will be increasingly useful to discover diagnostic signatures as the mass of microarray data continues to grow in the ‘omics’ era.
doi:10.1371/journal.pone.0003661
PMCID: PMC2575235
PMID: 18987750
Background
About 5% of western populations are afflicted by autoimmune diseases many of which are affected by sex hormones. Autoimmune diseases are complex and involve many genes. Identifying these disease-associated genes contributes to development of more effective therapies. Also, association studies frequently imply genomic regions that contain disease-associated genes but fall short of pinpointing these genes. The identification of disease-associated genes has always been challenging and to date there is no universal and effective method developed.
Results
We have developed a method to prioritize disease-associated genes for diseases affected strongly by sex hormones. Our method uses various types of information available for the genes, but no information that directly links genes with the disease. It generates a score for each of the considered genes and ranks genes based on that score. We illustrate our method on early-onset myasthenia gravis (MG) using genes potentially controlled by estrogen and localized in a genomic segment (which contains the MHC and surrounding region) strongly associated with MG. Based on the considered genomic segment 283 genes are ranked for their relevance to MG and responsiveness to estrogen. The top three ranked genes, HLA-G, TAP2 and HLA-DRB1, are implicated in autoimmune diseases, while TAP2 is associated with SNPs characteristic for MG. Within the top 35 prioritized genes our method identifies 90% of the 10 already known MG-associated genes from the considered region without using any information that directly links genes to MG. Among the top eight genes we identified HLA-G and TUBB as new candidates. We show that our ab-initio approach outperforms the other methods for prioritizing disease-associated genes.
Conclusion
We have developed a method to prioritize disease-associated genes under the potential control of sex hormones. We demonstrate the success of this method by prioritizing the genes localized in the MHC and surrounding region and evaluating the role of these genes as potential candidates for estrogen control as well as MG. We show that our method outperforms the other methods. The method has a potential to be adapted to prioritize genes relevant to other diseases.
doi:10.1186/1471-2164-9-481
PMCID: PMC2592250
PMID: 18851734
Background
Serial Analysis of Gene Expression (SAGE) is a DNA sequencing-based method for large-scale gene expression profiling that provides an alternative to microarray analysis. Most analyses of SAGE data aimed at identifying co-expressed genes have been accomplished using various versions of clustering approaches that often result in a number of false positives.
Principal Findings
Here we explore the use of seriation, a statistical approach for ordering sets of objects based on their similarity, for large-scale expression pattern discovery in SAGE data. For this specific task we implement a seriation heuristic we term ‘progressive construction of contigs’ that constructs local chains of related elements by sequentially rearranging margins of the correlation matrix. We apply the heuristic to the analysis of simulated and experimental SAGE data and compare our results to those obtained with a clustering algorithm developed specifically for SAGE data. We show using simulations that the performance of seriation compares favorably to that of the clustering algorithm on noisy SAGE data.
Conclusions
We explore the use of a seriation approach for visualization-based pattern discovery in SAGE data. Using both simulations and experimental data, we demonstrate that seriation is able to identify groups of co-expressed genes more accurately than a clustering algorithm developed specifically for SAGE data. Our results suggest that seriation is a useful method for the analysis of gene expression data whose applicability should be further pursued.
doi:10.1371/journal.pone.0003205
PMCID: PMC2527533
PMID: 18787709
Background
High-throughput gene expression data can predict gene function through the “guilt by association” principle: coexpressed genes are likely to be functionally associated.
Methodology/Principal Findings
We analyzed publicly available expression data on normal human tissues. The analysis is based on the integration of data obtained with two experimental platforms (microarrays and SAGE) and of various measures of dissimilarity between expression profiles. The building blocks of the procedure are the Ranked Coexpression Groups (RCG), small sets of tightly coexpressed genes which are analyzed in terms of functional annotation. Functionally characterized RCGs are selected by means of the majority rule and used to predict new functional annotations. Functionally characterized RCGs are enriched in groups of genes associated to similar phenotypes. We exploit this fact to find new candidate disease genes for many OMIM phenotypes of unknown molecular origin.
Conclusions/Significance
We predict new functional annotations for many human genes, showing that the integration of different data sets and coexpression measures significantly improves the scope of the results. Combining gene expression data, functional annotation and known phenotype-gene associations we provide candidate genes for several genetic diseases of unknown molecular basis.
doi:10.1371/journal.pone.0002439
PMCID: PMC2409962
PMID: 18560577
Zhou, Yingyao | Ramachandran, Vandana | Kumar, Kota Arun | Westenberger, Scott | Refour, Phillippe | Zhou, Bin | Li, Fengwu | Young, Jason A. | Chen, Kaisheng | Plouffe, David | Henson, Kerstin | Nussenzweig, Victor | Carlton, Jane | Vinetz, Joseph M. | Duraisingh, Manoj T. | Winzeler, Elizabeth A. | Hofmann, Oliver
A fundamental problem in systems biology and whole genome sequence analysis is how to infer functions for the many uncharacterized proteins that are identified, whether they are conserved across organisms of different phyla or are phylum-specific. This problem is especially acute in pathogens, such as malaria parasites, where genetic and biochemical investigations are likely to be more difficult. Here we perform comparative expression analysis on Plasmodium parasite life cycle data derived from P. falciparum blood, sporozoite, zygote and ookinete stages, and P. yoelii mosquito oocyst and salivary gland sporozoites, blood and liver stages and show that type II fatty acid biosynthesis genes are upregulated in liver and insect stages relative to asexual blood stages. We also show that some universally uncharacterized genes with orthologs in Plasmodium species, Saccharomyces cerevisiae and humans show coordinated transcription patterns in large collections of human and yeast expression data and that the function of the uncharacterized genes can sometimes be predicted based on the expression patterns across these diverse organisms. We also use a comprehensive and unbiased literature mining method to predict which uncharacterized parasite-specific genes are likely to have roles in processes such as gliding motility, host-cell interactions, sporozoite stage, or rhoptry function. These analyses, together with protein-protein interaction data, provide probabilistic models that predict the function of 926 uncharacterized malaria genes and also suggest that malaria parasites may provide a simple model system for the study of some human processes. These data also provide a foundation for further studies of transcriptional regulation in malaria parasites.
doi:10.1371/journal.pone.0001570
PMCID: PMC2215772
PMID: 18270564
Whole genome expression profiles are widely used to discover molecular subtypes of diseases. A remaining challenge is to identify the correspondence or commonality of subtypes found in multiple, independent data sets generated on various platforms. While model-based supervised learning is often used to make these connections, the models can be biased to the training data set and thus miss inherent, relevant substructure in the test data. Here we describe an unsupervised subclass mapping method (SubMap), which reveals common subtypes between independent data sets. The subtypes within a data set can be determined by unsupervised clustering or given by predetermined phenotypes before applying SubMap. We define a measure of correspondence for subtypes and evaluate its significance building on our previous work on gene set enrichment analysis. The strength of the SubMap method is that it does not impose the structure of one data set upon another, but rather uses a bi-directional approach to highlight the common substructures in both. We show how this method can reveal the correspondence between several cancer-related data sets. Notably, it identifies common subtypes of breast cancer associated with estrogen receptor status, and a subgroup of lymphoma patients who share similar survival patterns, thus improving the accuracy of a clinical outcome predictor.
doi:10.1371/journal.pone.0001195
PMCID: PMC2065909
PMID: 18030330
Background
Microarray technology enables a standardized, objective assessment of oncological diagnosis and prognosis. However, such studies are typically specific to certain cancer types, and the results have limited use due to inadequate validation in large patient cohorts. Discovery of genes commonly regulated in cancer may have an important implication in understanding the common molecular mechanism of cancer.
Methods and Findings
We described an integrated gene-expression analysis of 2,186 samples from 39 studies to identify and validate a cancer type-independent gene signature that can identify cancer patients for a wide variety of human malignancies. The commonness of gene expression in 20 types of common cancer was assessed in 20 training datasets. The discriminative power of a signature defined by these common cancer genes was evaluated in the other 19 independent datasets including novel cancer types. QRT-PCR and tissue microarray were used to validate commonly regulated genes in multiple cancer types. We identified 187 genes dysregulated in nearly all cancerous tissue samples. The 187-gene signature can robustly predict cancer versus normal status for a wide variety of human malignancies with an overall accuracy of 92.6%. We further refined our signature to 28 genes confirmed by QRT-PCR. The refined signature still achieved 80% accuracy of classifying samples from mixed cancer types. This signature performs well in the prediction of novel cancer types that were not represented in training datasets. We also identified three biological pathways including glycolysis, cell cycle checkpoint II and plk3 pathways in which most genes are systematically up-regulated in many types of cancer.
Conclusions
The identified signature has captured essential transcriptional features of neoplastic transformation and progression in general. These findings will help to elucidate the common molecular mechanism of cancer, and provide new insights into cancer diagnostics, prognostics and therapy.
doi:10.1371/journal.pone.0001149
PMCID: PMC2065803
PMID: 17989776
The Developmental eVOC ontologies presented are simplified orthogonal ontologies describing the temporal and spatial distribution of developmental human and mouse anatomy.
Model organisms represent an important resource for understanding the fundamental aspects of mammalian biology. Mapping of biological phenomena between model organisms is complex and if it is to be meaningful, a simplified representation can be a powerful means for comparison. The Developmental eVOC ontologies presented here are simplified orthogonal ontologies describing the temporal and spatial distribution of developmental human and mouse anatomy. We demonstrate the ontologies by identifying genes showing a bias for developmental brain expression in human and mouse.
doi:10.1186/gb-2007-8-10-r229
PMCID: PMC2246303
PMID: 17961239
Background
Fetal alcohol syndrome (FAS) is a serious global health problem and is observed at high frequencies in certain South African communities. Although in utero alcohol exposure is the primary trigger, there is evidence for genetic- and other susceptibility factors in FAS development. No genome-wide association or linkage studies have been performed for FAS, making computational selection and -prioritization of candidate disease genes an attractive approach.
Results
10174 Candidate genes were initially selected from the whole genome using a previously described method, which selects candidate genes according to their expression in disease-affected tissues. Hereafter candidates were prioritized for experimental investigation by investigating criteria pertinent to FAS and binary filtering. 29 Criteria were assessed by mining various database sources to populate criteria-specific gene lists. Candidate genes were then prioritized for experimental investigation using a binary system that assessed the criteria gene lists against the candidate list, and candidate genes were scored accordingly. A group of 87 genes was prioritized as candidates and for future experimental validation. The validity of the binary prioritization method was assessed by investigating the protein-protein interactions, functional enrichment and common promoter element binding sites of the top-ranked genes.
Conclusion
This analysis highlighted a list of strong candidate genes from the TGF-β, MAPK and Hedgehog signalling pathways, which are all integral to fetal development and potential targets for alcohol's teratogenic effect. We conclude that this novel bioinformatics approach effectively prioritizes credible candidate genes for further experimental analysis.
doi:10.1186/1471-2164-8-389
PMCID: PMC2194724
PMID: 17961254
Background
Cancers of the pancreas originate from both the endocrine and exocrine elements of the organ, and represent a major cause of cancer-related death. This study provides a comprehensive assessment of gene expression for pancreatic tumors, the normal pancreas, and nonneoplastic pancreatic disease.
Methods/Results
DNA microarrays were used to assess the gene expression for surgically derived pancreatic adenocarcinomas, islet cell tumors, and mesenchymal tumors. The addition of normal pancreata, isolated islets, isolated pancreatic ducts, and pancreatic adenocarcinoma cell lines enhanced subsequent analysis by increasing the diversity in gene expression profiles obtained. Exocrine, endocrine, and mesenchymal tumors displayed unique gene expression profiles. Similarities in gene expression support the pancreatic duct as the origin of adenocarcinomas. In addition, genes highly expressed in other cancers and associated with specific signal transduction pathways were also found in pancreatic tumors.
Conclusion
The scope of the present work was enhanced by the inclusion of publicly available datasets that encompass a wide spectrum of human tissues and enabled the identification of candidate genes that may serve diagnostic and therapeutic goals.
doi:10.1371/journal.pone.0000323
PMCID: PMC1824711
PMID: 17389914