|Home | About | Journals | Submit | Contact Us | Français|
Type 2 diabetes mellitus (T2DM) is a major disease affecting nearly 280 million people worldwide. Whilst the pathophysiological mechanisms leading to disease are poorly understood, dysfunction of the insulin-producing pancreatic beta-cells is key event for disease development. Monitoring the gene expression profiles of pancreatic beta-cells under several genetic or chemical perturbations has shed light on genes and pathways involved in T2DM. The EuroDia database has been established to build a unique collection of gene expression measurements performed on beta-cells of three organisms, namely human, mouse and rat. The Gene Expression Data Analysis Interface (GEDAI) has been developed to support this database. The quality of each dataset is assessed by a series of quality control procedures to detect putative hybridization outliers. The system integrates a web interface to several standard analysis functions from R/Bioconductor to identify differentially expressed genes and pathways. It also allows the combination of multiple experiments performed on different array platforms of the same technology. The design of this system enables each user to rapidly design a custom analysis pipeline and thus produce their own list of genes and pathways. Raw and normalized data can be downloaded for each experiment. The flexible engine of this database (GEDAI) is currently used to handle gene expression data from several laboratory-run projects dealing with different organisms and platforms.
Database URL: http://eurodia.vital-it.ch
Glucose homeostasis is maintained through the efficient modulation of insulin production and release by the pancreatic beta-cells coupled to a correct response of insulin-sensitive cells to the hormone. Failure of the beta-cells to produce adequate amounts of insulin triggers progressive glucose intolerance and eventually overt type 2 diabetes mellitus (T2DM) (1). T2DM is a global public health problem, affecting nearly 285 million individuals; prevalence of diabetes is projected to rise to 435 million by 2030 (International Diabetes Federation, Diabetes Atlas. Available at http://www.diabetesatlas.org/content/diabetes-and-impaired-glucose-tolerance). This imposes a huge burden on health-care systems. Of concern, the pathophysiological mechanisms underlying beta cell failure remain poorly understood, limiting the availability of novel approaches to treat or prevent T2DM.
Monitoring the transcriptome of functional and disturbed beta-cells might reveal genes and pathways involved in the maintenance of normal beta-cell functional capacity. In March 2006, a consortium of recognized European experts in the field of T2DM initiated EuroDia, an integrated project devoted to understanding the biology of the pancreatic beta-cell. Several transcriptomics experiments were planned using two different technologies, custom spotted arrays and Affymetrix chips, on three organisms: human, mouse and rat. The EuroDia database has been developed as a tool to integrate heterogeneous gene expression datasets, to enable sharing of data and to provide efficient analysis methods to mine the information content. Several public datasets from ArrayExpress (2), NCBI Gene Expression Omnibus (3) and the BetaCell Gene Bank (4,5) were first integrated into the system and, as the project evolved, new unpublished experiments were added and combined with public data for analysis. To stimulate collaboration, once published in the database, these experiments were shared freely between members of the consortium.
At the time of publication, the EuroDia database contains 38 curated experiments (441 hybridizations), 13 of which were produced by members of the EuroDia project. To ensure continuous access to this valuable data collection after the formal end of the project, the EuroDia database has now been opened to the whole T2DM research community for both consultation and contribution.
The Eurodia database has been built using Gene Expression Data Analysis Interface (GEDAI), a flexible framework for storing, analyzing and sharing gene expression data and results. GEDAI was originally developed for EuroDia, and is currently being used as a gene expression data storage and analysis pipeline for several other research projects.
The EuroDia database is a web-accessible resource for storing and analyzing gene expression data from pancreatic beta-cells. Raw and processed data files quantified from individual hybridization scans are grouped into experiments which are briefly described with a name, description, type (one or more per experiment) and ownership. Experiments are grouped into projects and are related to an organism (human, mouse or rat) whose genome is annotated with NCBI entrez gene data (6). Orthologous genes are identified using the NCBI homologene id annotation (6). These genome annotations enable comparisons between experiments studying either the same organism but using different array designs, or experiments studying different organisms. Additional annotations for the Affymetrix mouse 430_2 array based on probe exon mapping, signal intensity and uniqueness on the mouse genome have also been included (7).
At the time of publication the EuroDia database contains two projects, ‘EuroDia’ and ‘Public’. The Eurodia project contains experiments performed by members of this European consortium and the public project contains experiments imported from public repositories. All EuroDia experiments have both raw and normalized data available (Table 1), whereas some of the public experiments have only normalized data.
To upload an expression dataset, the user provides experiment annotation in a predefined Microsoft Excel template and uploads this file together with the raw data files (CEL files for Affymetrix; GenPix, Imagene for spotted arrays) zipped into an archive. Once uploaded, the raw data are then normalized using RMA (8) for one color arrays or loess (9) for two channel arrays and several quality control plots are generated to assist the user in identifying poor quality hybridizations.
Raw data can be downloaded either as binary (Affymetrix) or text (spotted) files. Normalized data annotated at the probe level with the manufacturer provided annotations can be downloaded as text files. The quality control graphs are arranged in a PDF report. Additionally, both raw and processed data can be downloaded as a Bioconductor (10) ExpressionSet object, which can be easily loaded into an R session to perform analyses that are not included in the EuroDia database.
The browse page of the EuroDia database (Figure 1) provides a convenient way to access an experiment. From this page the user can visualize quality control graphs, download data or perform statistical analysis for a dataset. Experiments are grouped by experiment type (for example: time series, dose response, compound treatment or genetic variation), project, laboratory affiliation, organism or array design, thus providing multiple entry points for a user to access a particular experiment. Users can also find an experiment by keyword using a search field.
In addition to providing a data repository for a variety of pancreatic beta-cell experiments, the EuroDia database contains several analysis tools for mining the data. Through the web interface, a user can identify differentially expressed genes by fitting a linear model for each gene and evaluating the fold change and moderated t-statistics P-values (11). It is possible to use one of several web forms designed to describe common experimental set-ups; the different web forms available are for (i) group designs allowing to compare two or more conditions, (ii) factorial designs of type two by two for comparing the combined effect of two conditions or treatments (e.g. the combined effect of a treatment and a mutant background) or (iii) paired samples where hybridizations are compared by pairs. The latter may be used, for example, to identify differentially expressed genes between alpha and beta cells of six pancreas samples, each alpha cell sample being paired with the beta cell sample from the same organ. Radio buttons and checkboxes are used to assign each hybridization to a condition. A few additional filters can also be set to correct the obtained P-values for multiple testing using either Holm, Benjamini-Hochberg or the Storey-Tibshirani false discovery rate (FDR) methods and to exclude probes showing low expression and/or variance. Results are presented as tables that can be sorted, filtered and downloaded together with probe annotations provided by the array manufacturer (Figure 2).
Once a set of differentially expressed genes has been identified, the next step is often to explore the biology around these genes. The EuroDia database provides several tools to help extract valuable information and knowledge from gene expression data. From the web interface a user can evaluate whether the differentially expressed genes are enriched for particular gene ontology (GO) (12) categories, KEGG pathways (13) or Reactome metabolic maps (14). The results are presented as a table of significant categories or pathways with the relevant P-values for enrichment. In addition to these enrichment analyses, a user can also perform Gene Set Enrichment Analysis (GSEA) (14) using ordered gene lists to identify enriched pathways or functionally related groups of genes. The gene lists for GSEA can either be selected from MSigDB (14) or imported by the user.
As an alternative, for some experiments a user might be interested in identifying subgroups of genes and conditions that share similar expression profiles (expression modules). The EuroDia database interface offers the possibility to identify expression modules using the iterative signature algorithm (15–17). For each expression module, a GO category, KEGG pathway and chromosomal location enrichment is computed (18).
The EuroDia database contains datasets from different organisms and different microarray platforms. It is possible to combine multiple experiments of the same platform type (Affymetrix or spotted) by merging probes using either their unique probe identifier (to combine data from two versions of the same platform), NCBI gene index (to combine data from the same organism on different platforms) or NCBI homologene id (to combine data from different organisms). To address the problem of variability between measurements originating from different laboratories, expression ratios between conditions are not calculated for merged experiments. Instead, the rank products algorithm (20) is used to compare the co-occurrence of one gene amongst the most up or down regulated genes of all the compared hybridizations.
Finally, to provide a more global view of the data, tools have been incorporated to display the expression profile of a particular gene across the whole database and to measure the correlation of the global expression profiles of two genes.
The EuroDia database is a unique collection of beta-cell gene expression datasets generated by a consortium of European experts. Relevant datasets have also been imported from public repositories. Because of the need to provide users in the different laboratories with a way of uploading and sharing data that is both time-efficient and user-friendly, we opted not to include all experimental annotations that would be required to make the database MIAME (24) compliant. However, by offering a user-friendly interface to several well-accepted R/Bioconductor packages, the EuroDia database enables to rapidly evaluate the quality of a dataset, to identify differentially expressed genes and to reveal potentially altered biological pathways or molecular functions. Having the data repository integrated with the analysis tools also avoids the cumbersome steps of data extraction, reformatting and loading of data into an external tool. A unique feature of our database is its ability to combine studies performed using different array platforms, or even different organisms, in a single analysis. Other similar diabetes resources such as T1DB (4,5) or EPConDB (25), whilst being valuable data repositories, often lack the flexibility of EuroDia, such as the integration of quality controls and the ability to perform re-analysis.
The EuroDia database was built using the Gene Expression Data Analysis Interface (GEDAI) framework, which supports gene expression data measured with Affymetrix gene chips, Agilent arrays, Illumina gene chips and custom spotted arrays. Currently GEDAI is used for several independent research projects and handles gene expression data from Arabidopsis, Ant, Mouse, Rat and Human. The EuroDia database will be maintained in the coming years and will evolve to integrate new gene expression datatypes like RNAseq. Scientists interested in depositing their array data are welcome to contact us at firstname.lastname@example.org.
Supported by the European Union (Integrated Project EuroDia LSHM-CT-2006-518153 in the Framework Programme 6[FP6] of the European-Community). Funding for open access charge: European Union (Integrated Project EuroDia LSHM-CT-2006-518153 in the Framework Programme 6 [FP6] of the European-Community).
Conflict of interest statement. None declared.
The authors thank Mark Ibberson for careful reading of the article.