PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of databaseLink to Publisher's site
 
Database (Oxford). 2010; 2010: baq024.
Published online Oct 12, 2010. doi:  10.1093/database/baq024
PMCID: PMC2963318
EuroDia: a beta-cell gene expression resource
Robin Liechti,1 Gábor Csárdi,2,3 Sven Bergmann,2,3 Frédéric Schütz,4 Thierry Sengstag,5 Sylvia F. Boj,6 Joan-Marc Servitja,6 Jorge Ferrer,6 Leentje Van Lommel,7 Frans Schuit,7 Sonia Klinger,8 Bernard Thorens,8 Najib Naamane,9 Decio L. Eizirik,9 Lorella Marselli,10 Marco Bugliani,10 Piero Marchetti,10 Stephanie Lucas,11 Cecilia Holm,11 C. Victor Jongeneel,12 and Ioannis Xenarios1*
1Vital-IT, SIB Swiss Institute of Bioinformatics, Genopode Building, 2Department of Medical Genetics, University of Lausanne, 3Computational Biology, SIB Swiss Institute of Bioinformatics, Rue de Bugnon 27, 4Bioinformatics Core Facility, SIB Swiss Institute of Bioinformatics, Genopode Building, CH-1015 Lausanne, Switzerland, 5RIKEN Yokohama Institute, Omics Science Center, Yokohama City, Kanagawa, 230-0045, Japan, 6Genomic Programming of Beta-cells Laboratory, Institut d'Investigacions Biomèdiques August Pi i Sunyer, 08036 Barcelona, Spain, 7Gene Expression Unit, Department of Molecular Cell Biology, Katholieke Universiteit Leuven, 3000 Leuven, Belgium, 8Institute of Physiology, University of Lausanne, 1015 Lausanne, Switzerland, 9Laboratory of Experimental Medicine, Universite Libre de Bruxelles (ULB), 1070 Brussels, Belgium, 10Section of Endocrinology and Metabolism of Organ Transplantation, Department of Endocrinology and Metabolism, University of Pisa, 56124 Pisa, Italy, 11Department of Experimental Medical Science, Lund University, SE-221 84 Lund, Sweden and 12National Center for Supercomputing Applications and Institute for Genomic Biology, University of Illinois at Urbana-Champaign, IL 61801 Urbana, Champaign, USA
*Corresponding author : Tel: Phone: +41 21 692 40 80; Fax: +41 21 692 40 65; Email: eurodia/at/isb-sib.ch
Received August 13, 2010; Accepted September 28, 2010.
Type 2 diabetes mellitus (T2DM) is a major disease affecting nearly 280 million people worldwide. Whilst the pathophysiological mechanisms leading to disease are poorly understood, dysfunction of the insulin-producing pancreatic beta-cells is key event for disease development. Monitoring the gene expression profiles of pancreatic beta-cells under several genetic or chemical perturbations has shed light on genes and pathways involved in T2DM. The EuroDia database has been established to build a unique collection of gene expression measurements performed on beta-cells of three organisms, namely human, mouse and rat. The Gene Expression Data Analysis Interface (GEDAI) has been developed to support this database. The quality of each dataset is assessed by a series of quality control procedures to detect putative hybridization outliers. The system integrates a web interface to several standard analysis functions from R/Bioconductor to identify differentially expressed genes and pathways. It also allows the combination of multiple experiments performed on different array platforms of the same technology. The design of this system enables each user to rapidly design a custom analysis pipeline and thus produce their own list of genes and pathways. Raw and normalized data can be downloaded for each experiment. The flexible engine of this database (GEDAI) is currently used to handle gene expression data from several laboratory-run projects dealing with different organisms and platforms.
Glucose homeostasis is maintained through the efficient modulation of insulin production and release by the pancreatic beta-cells coupled to a correct response of insulin-sensitive cells to the hormone. Failure of the beta-cells to produce adequate amounts of insulin triggers progressive glucose intolerance and eventually overt type 2 diabetes mellitus (T2DM) (1). T2DM is a global public health problem, affecting nearly 285 million individuals; prevalence of diabetes is projected to rise to 435 million by 2030 (International Diabetes Federation, Diabetes Atlas. Available at http://www.diabetesatlas.org/content/diabetes-and-impaired-glucose-tolerance). This imposes a huge burden on health-care systems. Of concern, the pathophysiological mechanisms underlying beta cell failure remain poorly understood, limiting the availability of novel approaches to treat or prevent T2DM.
Monitoring the transcriptome of functional and disturbed beta-cells might reveal genes and pathways involved in the maintenance of normal beta-cell functional capacity. In March 2006, a consortium of recognized European experts in the field of T2DM initiated EuroDia, an integrated project devoted to understanding the biology of the pancreatic beta-cell. Several transcriptomics experiments were planned using two different technologies, custom spotted arrays and Affymetrix chips, on three organisms: human, mouse and rat. The EuroDia database has been developed as a tool to integrate heterogeneous gene expression datasets, to enable sharing of data and to provide efficient analysis methods to mine the information content. Several public datasets from ArrayExpress (2), NCBI Gene Expression Omnibus (3) and the BetaCell Gene Bank (4,5) were first integrated into the system and, as the project evolved, new unpublished experiments were added and combined with public data for analysis. To stimulate collaboration, once published in the database, these experiments were shared freely between members of the consortium.
At the time of publication, the EuroDia database contains 38 curated experiments (441 hybridizations), 13 of which were produced by members of the EuroDia project. To ensure continuous access to this valuable data collection after the formal end of the project, the EuroDia database has now been opened to the whole T2DM research community for both consultation and contribution.
The Eurodia database has been built using Gene Expression Data Analysis Interface (GEDAI), a flexible framework for storing, analyzing and sharing gene expression data and results. GEDAI was originally developed for EuroDia, and is currently being used as a gene expression data storage and analysis pipeline for several other research projects.
The EuroDia database is a web-accessible resource for storing and analyzing gene expression data from pancreatic beta-cells. Raw and processed data files quantified from individual hybridization scans are grouped into experiments which are briefly described with a name, description, type (one or more per experiment) and ownership. Experiments are grouped into projects and are related to an organism (human, mouse or rat) whose genome is annotated with NCBI entrez gene data (6). Orthologous genes are identified using the NCBI homologene id annotation (6). These genome annotations enable comparisons between experiments studying either the same organism but using different array designs, or experiments studying different organisms. Additional annotations for the Affymetrix mouse 430_2 array based on probe exon mapping, signal intensity and uniqueness on the mouse genome have also been included (7).
At the time of publication the EuroDia database contains two projects, ‘EuroDia’ and ‘Public’. The Eurodia project contains experiments performed by members of this European consortium and the public project contains experiments imported from public repositories. All EuroDia experiments have both raw and normalized data available (Table 1), whereas some of the public experiments have only normalized data.
Table 1.
Table 1.
List of experiments contained in the EuroDia database
To upload an expression dataset, the user provides experiment annotation in a predefined Microsoft Excel template and uploads this file together with the raw data files (CEL files for Affymetrix; GenPix, Imagene for spotted arrays) zipped into an archive. Once uploaded, the raw data are then normalized using RMA (8) for one color arrays or loess (9) for two channel arrays and several quality control plots are generated to assist the user in identifying poor quality hybridizations.
Raw data can be downloaded either as binary (Affymetrix) or text (spotted) files. Normalized data annotated at the probe level with the manufacturer provided annotations can be downloaded as text files. The quality control graphs are arranged in a PDF report. Additionally, both raw and processed data can be downloaded as a Bioconductor (10) ExpressionSet object, which can be easily loaded into an R session to perform analyses that are not included in the EuroDia database.
The browse page of the EuroDia database (Figure 1) provides a convenient way to access an experiment. From this page the user can visualize quality control graphs, download data or perform statistical analysis for a dataset. Experiments are grouped by experiment type (for example: time series, dose response, compound treatment or genetic variation), project, laboratory affiliation, organism or array design, thus providing multiple entry points for a user to access a particular experiment. Users can also find an experiment by keyword using a search field.
Figure 1.
Figure 1.
Web interface. (A) Experiments can be grouped by type, laboratory affiliation, project, organism or array design. For each category, experiment name and description are presented. Clicking on one experiment name reveals quality control plots (B) and the (more ...)
In addition to providing a data repository for a variety of pancreatic beta-cell experiments, the EuroDia database contains several analysis tools for mining the data. Through the web interface, a user can identify differentially expressed genes by fitting a linear model for each gene and evaluating the fold change and moderated t-statistics P-values (11). It is possible to use one of several web forms designed to describe common experimental set-ups; the different web forms available are for (i) group designs allowing to compare two or more conditions, (ii) factorial designs of type two by two for comparing the combined effect of two conditions or treatments (e.g. the combined effect of a treatment and a mutant background) or (iii) paired samples where hybridizations are compared by pairs. The latter may be used, for example, to identify differentially expressed genes between alpha and beta cells of six pancreas samples, each alpha cell sample being paired with the beta cell sample from the same organ. Radio buttons and checkboxes are used to assign each hybridization to a condition. A few additional filters can also be set to correct the obtained P-values for multiple testing using either Holm, Benjamini-Hochberg or the Storey-Tibshirani false discovery rate (FDR) methods and to exclude probes showing low expression and/or variance. Results are presented as tables that can be sorted, filtered and downloaded together with probe annotations provided by the array manufacturer (Figure 2).
Figure 2.
Figure 2.
Experiment analysis. The order of the boxes reflects the flow of analysis steps. From top to bottom and left to right, the experiment name and description, the selection of the type of analysis, the form to describe analysis design (for this example, (more ...)
Once a set of differentially expressed genes has been identified, the next step is often to explore the biology around these genes. The EuroDia database provides several tools to help extract valuable information and knowledge from gene expression data. From the web interface a user can evaluate whether the differentially expressed genes are enriched for particular gene ontology (GO) (12) categories, KEGG pathways (13) or Reactome metabolic maps (14). The results are presented as a table of significant categories or pathways with the relevant P-values for enrichment. In addition to these enrichment analyses, a user can also perform Gene Set Enrichment Analysis (GSEA) (14) using ordered gene lists to identify enriched pathways or functionally related groups of genes. The gene lists for GSEA can either be selected from MSigDB (14) or imported by the user.
As an alternative, for some experiments a user might be interested in identifying subgroups of genes and conditions that share similar expression profiles (expression modules). The EuroDia database interface offers the possibility to identify expression modules using the iterative signature algorithm (15–17). For each expression module, a GO category, KEGG pathway and chromosomal location enrichment is computed (18).
The EuroDia database contains datasets from different organisms and different microarray platforms. It is possible to combine multiple experiments of the same platform type (Affymetrix or spotted) by merging probes using either their unique probe identifier (to combine data from two versions of the same platform), NCBI gene index (to combine data from the same organism on different platforms) or NCBI homologene id (to combine data from different organisms). To address the problem of variability between measurements originating from different laboratories, expression ratios between conditions are not calculated for merged experiments. Instead, the rank products algorithm (20) is used to compare the co-occurrence of one gene amongst the most up or down regulated genes of all the compared hybridizations.
Finally, to provide a more global view of the data, tools have been incorporated to display the expression profile of a particular gene across the whole database and to measure the correlation of the global expression profiles of two genes.
The web interface of the EuroDia database is generated using a combination of JavaScript and PHP scripts that form the GEDAI framework. The data are stored in a MySQL database. The majority of statistical analyses are performed using packages from R/Bioconductor (10). Quality controls and normalizations of two channel arrays are performed with the marray (19) and limma (9) packages, and the affy (20) and affyPLM (8) packages are used to process Affymetrix arrays. Affymetrix Gene and Exon arrays are normalized using the Affymetrix Power Tools, a set of command line programs from Affymetrix. Normalized experiments are stored as Bioconductor ExpressionSet objects that can be efficiently processed to identify differentially expressed genes. These functions are integrated in the limma (11), qvalue (21), RankProd (22) and eisa (18) packages. GO categories and KEGG pathways enrichments are computed by functions of the GOstats (23) package. The GSEA analysis is performed by a speed-improved version of the original GSEA algorithm. Most of the R scripts used to run analyses can be downloaded from the web interface.
The EuroDia database is a unique collection of beta-cell gene expression datasets generated by a consortium of European experts. Relevant datasets have also been imported from public repositories. Because of the need to provide users in the different laboratories with a way of uploading and sharing data that is both time-efficient and user-friendly, we opted not to include all experimental annotations that would be required to make the database MIAME (24) compliant. However, by offering a user-friendly interface to several well-accepted R/Bioconductor packages, the EuroDia database enables to rapidly evaluate the quality of a dataset, to identify differentially expressed genes and to reveal potentially altered biological pathways or molecular functions. Having the data repository integrated with the analysis tools also avoids the cumbersome steps of data extraction, reformatting and loading of data into an external tool. A unique feature of our database is its ability to combine studies performed using different array platforms, or even different organisms, in a single analysis. Other similar diabetes resources such as T1DB (4,5) or EPConDB (25), whilst being valuable data repositories, often lack the flexibility of EuroDia, such as the integration of quality controls and the ability to perform re-analysis.
The EuroDia database was built using the Gene Expression Data Analysis Interface (GEDAI) framework, which supports gene expression data measured with Affymetrix gene chips, Agilent arrays, Illumina gene chips and custom spotted arrays. Currently GEDAI is used for several independent research projects and handles gene expression data from Arabidopsis, Ant, Mouse, Rat and Human. The EuroDia database will be maintained in the coming years and will evolve to integrate new gene expression datatypes like RNAseq. Scientists interested in depositing their array data are welcome to contact us at eurodia@vital-it.ch.
Funding
Supported by the European Union (Integrated Project EuroDia LSHM-CT-2006-518153 in the Framework Programme 6[FP6] of the European-Community). Funding for open access charge: European Union (Integrated Project EuroDia LSHM-CT-2006-518153 in the Framework Programme 6 [FP6] of the European-Community).
Conflict of interest statement. None declared.
Acknowledgement
The authors thank Mark Ibberson for careful reading of the article.
1. Kahn HS, Cheng YJ, Thompson TJ, et al. Two risk-scoring systems for predicting incident diabetes mellitus in U.S. adults age 45 to 64 years. Ann. Intern. Med. 2009;150:741–751. [PubMed]
2. Parkinson H, Kapushesky M, Kolesnikov N, et al. ArrayExpress update: from an archive of functional genomics experiments to the atlas of gene expression. Nucleic Acids Res. 2009;37:D868–D872. [PMC free article] [PubMed]
3. Barrett T, Troup DB, Wilhite SE, et al. NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res. 2009;37:D885–D890. [PMC free article] [PubMed]
4. Hulbert EM, Smink LJ, Adlem EC, et al. T1DBase: integration and presentation of complex data for type 1 diabetes research. Nucleic Acids Res. 2007;35:D742–D746. [PMC free article] [PubMed]
5. Smink LJ, Helton EM, Healy BC, et al. T1DBase, a community web-based resource for type 1 diabetes research. Nucleic Acids Res. 2005;33:D544–D549. [PMC free article] [PubMed]
6. Wheeler DL, Barrett T, Benson DA, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2008;36:D13–D21. [PMC free article] [PubMed]
7. Thorrez L, Tranchevent L-C, Chang HJ, et al. Detection of novel 3' untranslated region extensions with 3' expression microarrays. BMC Genomics. 2010;11:205. [PMC free article] [PubMed]
8. Irizarry RA, Hobbs B, Collin F, et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003;4:249–264. [PubMed]
9. Smyth GK, Speed T. Normalization of cDNA microarray data. Methods. 2003;31:265–273. [PubMed]
10. Gentleman RC, Carey VJ, Bates DM, et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004;5:R80. [PMC free article] [PubMed]
11. Smyth GK. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 2004;3:Article 3. [PubMed]
12. Ashburner M, Ball CA, Blake JA, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 2000;25:25–29. [PMC free article] [PubMed]
13. Kanehisa M, Goto S, Furumichi M, et al. KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res. 2010;38:D355–D360. [PMC free article] [PubMed]
14. Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA. 2005;102:15545–15550. [PubMed]
15. Bergmann S, Ihmels J, Barkai N. Iterative signature algorithm for the analysis of large-scale gene expression data. Phys. Rev. E. Stat. Nonlin. Soft. Matter Phys. 2003;67:031902. [PubMed]
16. Ihmels J, Bergmann S, Barkai N. Defining transcription modules using large-scale gene expression data. Bioinformatics. 2004;20:1993–2003. [PubMed]
17. Ihmels J, Friedlander G, Bergmann S, et al. Revealing modular organization in the yeast transcriptional network. Nat. Genet. 2002;31:370–377. [PubMed]
18. Csárdi G, Kutalik Z, Bergmann S. Modular analysis of gene expression data with R. Bioinformatics. 2010;26:1376–1377. [PubMed]
19. Wang J, Nygaard V, Smith-Sørensen B, et al. MArray: analysing single, replicated or reversed microarray experiments. Bioinformatics. 2002;18:1139–1140. [PubMed]
20. Gautier L, Cope L, Bolstad BM, et al. affy: analysis of Affymetrix GeneChip data at the probe level. Bioinformatics. 2004;20:307–315. [PubMed]
21. Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA. 2003;100:9440–5. [PubMed]
22. Hong F, Breitling R, McEntee CW, et al. RankProd: a bioconductor package for detecting differentially expressed genes in meta-analysis. Bioinformatics. 2006;22:2825–2827. [PubMed]
23. Falcon S, Gentleman R. Using GOstats to test gene lists for GO term association. Bioinformatics. 2007;23:257–258. [PubMed]
24. Brazma A, Hingamp P, Quackenbush J, et al. Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat. Genet. 2001;29:365–371. [PubMed]
25. Mazzarelli JM, Brestelli J, Gorski RK, et al. EPConDB: a web resource for gene expression related to pancreatic development, beta-cell function and diabetes. Nucleic Acids Res. 2007;35:D751–D755. [PMC free article] [PubMed]
26. Cardozo AK, Kruhøffer M, Leeman R, et al. Identification of novel cytokine-induced genes in pancreatic beta-cells by high-density oligonucleotide arrays. Diabetes. 2001;50:909–920. [PubMed]
27. Rasschaert J, Liu D, Kutlu B, et al. Global profiling of double stranded RNA- and IFN-gamma-induced genes in rat pancreatic beta cells. Diabetologia. 2003;46:1641–1657. [PubMed]
28. Cardozo AK, Heimberg H, Heremans Y, et al. A comprehensive analysis of cytokine-induced and nuclear factor-kappa B-dependent genes in primary rat pancreatic beta-cells. J. Biol. Chem. 2001;276:48879–48886. [PubMed]
29. Ortis F, Naamane N, Flamez D, et al. Cytokines interleukin-1beta and tumor necrosis factor-alpha regulate different transcriptional and alternative splicing networks in primary beta-cells. Diabetes. 2010;59:358–374. [PMC free article] [PubMed]
30. Servitja J-M, Pignatelli M, Maestro MA, et al. Hnf1alpha (MODY3) controls tissue-specific transcriptional programs and exerts opposed effects on cell growth in pancreatic islets and liver. Mol. Cell Biol. 2009;29:2945–2959. [PMC free article] [PubMed]
31. Boj SF, Petrov D, Ferrer J. Epistasis of transcriptomes reveals synergism between transcriptional activators Hnf1alpha and Hnf4alpha. PLoS Genet. 2010;6:e1000970. [PMC free article] [PubMed]
32. Boj SF, Servitja JM, Martin D, et al. Functional targets of the monogenic diabetes transcription factors HNF-1alpha and HNF-4alpha are highly conserved between mice and humans. Diabetes. 2009;58:1245–1253. [PMC free article] [PubMed]
33. Bugliani M, Masini M, Liechti R, et al. The direct effects of tacrolimus and cyclosporin A on isolated human islets: a functional, survival and gene expression study. Islets. 2009;1:106–110. [PubMed]
34. Marselli L, Thorne J, Dahiya S, et al. Gene expression profiles of beta-cell enriched tissue obtained by laser capture microdissection from subjects with type 2 diabetes. PLoS ONE. 2010;5:e11499. [PMC free article] [PubMed]
35. Scheuner D, Vander Mierde D, Song B, et al. Control of mRNA translation preserves endoplasmic reticulum function in beta cells and maintains glucose homeostasis. Nat. Med. 2005;11:757–764. [PubMed]
36. Cornu M, Modi H, Kawamori D, et al. Glucagon-like peptide-1 increases beta-cell glucose competence and proliferation by translational induction of insulin-like growth factor-1 receptor expression. J. Biol. Chem. 2010;285:10538–10545. [PubMed]
37. Cornu M, Yang J-Y, Jaccard E, et al. Glucagon-like peptide-1 protects beta-cells against apoptosis by increasing the activity of an IGF-2/IGF-1 receptor autocrine loop. Diabetes. 2009;58:1816–1825. [PMC free article] [PubMed]
38. Klinger S, Poussin C, Debril M-B, et al. Increasing GLP-1-induced beta-cell proliferation by silencing the negative regulators of signaling cAMP response element modulator-alpha and DUSP14. Diabetes. 2008;57:584–593. [PubMed]
Articles from Database: The Journal of Biological Databases and Curation are provided here courtesy of
Oxford University Press