], with enhancements to accommodate data from different platforms and also complies to the Minimum Information About Microarray Experiment (MIAME) standard [16
]. It is based on a well-designed relational schema where "realexp" acts as a central table linking expression data with an experiment, sample and array type. This kind of schema helps the system to manage data efficiently, and increases retrieval speed.
RETINOBASE is designed to store gene expression profiles from microarray experiments. We downloaded all publicly available retina-related expression profiles from Gene Expression Omnibus (GEO) yielding 21 experiments [17
], GEO datasets (GSE 1816, 4756, 1835, 3791, 2868). In addition, 8 proprietary experiments have been incorporated that can be accessed with permission from the owner of the experiment. These experiments were performed under different conditions, including knockout models, treatments and time series experiments performed on different organisms such as drosophila, zebra fish, rat, mice and human. All experiments have complete data, except for one experiment [19
] that has partial data at the level of fold change, due to the unavailability of raw data (.CEL) or signal intensity data. Currently, RETINOBASE contains approximately 27 million gene expression values resulting from 509 hybridizations. In future releases of the database, we plan to include data from other studies associated with retina, including the SAGE [33
], datasets from Diehn and coworkers [34
] who used cDNA array to study human eye tissues, and/or datasets from Blackshaw and coworkers [35
] who used SAGE to study mouse retinal development.
In RETINOBASE, the gene annotation information obtained from Affymetrix [36
] is linked to information about genes and loci causing inherited retinal diseases, obtained from the Retinal information network (RETNET) [37
]. RETINOBASE also provides information obtained from literature about expression of approximately 200 retinal genes specific to certain types of cell, such as photoreceptors, Muller cells or retinal sphere cells.
Raw data was obtained in two different formats, either as .CEL files (20 experiments) or at the level of signal intensities (8 experiments). Data obtained at the level of .CEL files are first analysed with three different normalization programs – RMA [10
], dChip [11
] and MAS5 [12
] and then processed using the R statistical package [38
] and Bioconductor [39
]; after preprocessing, the resulting background-corrected and normalized signal intensities are automatically uploaded to RETINOBASE using SQL scripts via pgAdminIII.
Identification of control samples in an experiment facilitated incorporation of data at the level of fold change in RETINOBASE. The fold-changes in gene expression were calculated as the ratio between the signal intensities of a given gene in the treated (or knockout) model and the control. In the case of experiments performed in replicate, signal intensities were averaged before calculation of the ratios. All the experiments in RETINOBASE were clustered using 3 independent methods: (i) the density of points clustering (DPC) method [40
] which is implemented in the in-house FASABI (Functional And Statistical Analysis of Biological Data) software, (ii) the dot product K-means method [41
] used in TM4 Multiexperiment Viewer (MeV) a free, open-source system for microarray data management and analysis [42
], (iii) the mixture model method implemented in FASABI. Although cluster analyses often provide useful insights into the data, biological interpretation of the results is recommended, since alternative algorithms generally produce different cluster outputs and no single clustering algorithm is best suited for clustering genes into functional groups for all data sets [43
]. We chose the DPC, K-means and mixture models methods because of their robustness in clustering large datasets. Although the K-means method generally requires the user to choose the number of clusters to be calculated, the TMEV system uses figure of merit (FOM) graphs [44
] to make an appropriate suggestion. Other clustering algorithms, such as a graph-theoretic approach [45
], and a neural network based method SOM [46
], as well as different parameter options, will be incorporated in future releases of the database. Storing both the normalized and analyzed data in our relational model allows flexible comparisons across different chips at the level of individual genes.
Quality control reports are generated using affyQCReport – an R package that generates quality control reports for Affymetrix array data [47
] and RReportGenerator [48
] for all experiments, where .CEL files are available. In addition, we also calculate a coefficient of variation for individual Probe Sets between the replicates, which provides a direct estimate of the quality between replicates.
Experiment and sample details
The RETINOBASE home page presents a list of all experiments available to the user and also provides access to experimental details such as title, short description etc. The "Sample details" option (Figure ) gives details about sample description, organism, tissue, treatment, strain specific information and the array used for hybridisation for a given experiment.
RETINOBASE home page. The home page of RETINOBASE  which has general information such as experiment and sample details. Specific query options are shown as in the database.
Querying the database
RETINOBASE has three different querying modules: "Gene Information", "Raw Data System Analysis" and "Fold change system Analysis".
Gene information module
The "Gene Information" module offers three different query options – "Gene Query", "Ortholog Query" and "Blast Query". Using these, one can access information such as chromosomal location, linked retinal diseases, cellular localization, and gene ontologies for a given gene. Furthermore, gene details returned from these queries are linked to external databases such as GeneCards [49
], NCBI [50
], specifically to UniGene [51
], ADAPT mapping viewer [52
] and also to UCSC genome browser [53
] that would yield more information (Figure ).
Figure 2 RETINOBASE Queries. A "Gene Query" yields information such as Unigene ID, chromosomal location, Entrez gene, expression pattern, linked diseases and gene ontology. The thick black arrow indicates that raw data and cluster information can be accessed directly (more ...)
"Gene Query" and "Ortholog Query" accept as input the gene name, symbol, Affymetrix Probe Set ID, Refseq or Unigene IDs, whereas "Blast Query" accepts sequences in FASTA format. "Ortholog Query" is useful in cross-referencing probe sets between different Affymetrix GeneChip arrays. The data based on reference sequence similarity is taken from HomoloGene and cross-referenced. In addition, the raw data and cluster information for a given gene (cluster number, software used for clustering and information about other genes present in the same cluster) for all experiments can be obtained through the "Gene Query" (Figure ).
Raw data system analysis module
This module has "Data and Cluster Query" options and "Data visualization" which is both a query and visualization option. "Data Query" (Figure ) provides gene expression information at the level of signal intensities for single or multiple genes in all experiments. "Cluster Query" (Figure ) – unique to RETINOBASE, provides information about expression patterns of related genes across various conditions and genetic backgrounds. It also identifies any two given genes in the same cluster in one or more experiments. Apart from the above mentioned query options, RETINOBASE also provides a user-friendly transcriptomic data visualization tool that was developed to allow retinal biologists to graphically analyse gene expression profiles across all the experiments. A user can choose the experiment, chip, gene and analysis software to be used in a step-by-step process, following which the related samples can be labelled and organized for an easy comparison through histograms or radar-graph representations (Figure ). This step-by-step process effectively increases querying speed, which in turn allows faster retrieval of specific data from large volumes of gene expression information. Additional information concerning the number of Probe Sets for a gene on a given chip, the normalization software used to obtain the signal intensities and the quality control report of the experiment are also provided.
Figure 3 Data and Cluster Query options. Data and cluster query results for the NRL gene in experiment 7 : "Targeting GFP to new born by NRL promoter and temporal expression profiling of flow-sorted photoreceptors". The user can subsequently obtain all genes (more ...)
Figure 4 Data visualization. Expression profile of two Probe Sets of cone-rod homeobox containing gene (CRX) in the experiment 7 : "Targeting GFP to new born by NRL promoter and temporal expression profiling of flow-sorted photoreceptors". Data is represented (more ...)
Fold change system analysis module
Gene expression information at the level of fold change is provided for single or multiple genes in one or more experiments. In addition, "Ratio Query" supports a specialized query that permits retrieval of all genes from one or more experiments having a fold change greater and/or less than a given criteria.
Downloading results and user manual
In order to allow users to further compare and interpret data, the results from all querying modules available in RETINOBASE can be downloaded in the comma separated value (.CSV) file format using the "Download results" option.
A user manual is also available on the home page of RETINOBASE and it would provide a detailed description of the utilities.