The explosion of available microarray data on human cancer increases the urgency for developing methods for effectively sharing this data among clinical cancer investigators. Lack of a smooth interface between the databases and statistical analysis tools limits the potential benefits of sharing the publicly available microarray data. To facilitate the efficient sharing and use of publicly available microarray data among cancer investigators, we have built a BRB-ArrayTools Data Archive including over one hundred human cancer microarray projects for 28 cancer types. Expression array data and clinical descriptors have been imported into BRB-ArrayTools and are stored as BRB-ArrayTools project folders on the archive. The data archive can be accessed from: http://linus.nci.nih.gov/~brb/DataArchive.html Our BRB-ArrayTools data archive and GEO importer represent ongoing efforts to provide effective tools for efficiently sharing and utilizing human cancer microarray data.
Helicobacter pylori infection reprograms host gene expression and influences various cellular processes, which have been investigated by cDNA microarray using in vitro culture cells and in vivo gastric biopsies from patients of the Chronic Abdominal Complaint. To further explore the effects of H. pylori infection on host gene expression, we have collected the gastric antral mucosa samples from 6 untreated patients with gastroscopic and pathologic confirmation of chronic superficial gastritis. Among them three patients were infected by H. pylori and the other three patients were not. These samples were analyzed by a microarray chip which contains 14,112 cloned cDNAs, and microarray data were analyzed via BRB ArrayTools software and Ingenuity Pathways Analysis (IPA) website. The results showed 34 genes of 38 differentially expressed genes regulated by H. pylori infection had been annotated. The annotated genes were involved in protein metabolism, inflammatory and immunological reaction, signal transduction, gene transcription, trace element metabolism, and so on. The 82% of these genes (28/34) were categorized in three molecular interaction networks involved in gene expression, cancer progress, antigen presentation and inflammatory response. The expression data of the array hybridization was confirmed by quantitative real-time PCR assays. Taken together, these data indicated that H. pylori infection could alter cellular gene expression processes, escape host defense mechanism, increase inflammatory and immune responses, activate NF-κB and Wnt/β-catenin signaling pathway, disturb metal ion homeostasis, and induce carcinogenesis. All of these might help to explain H. pylori pathogenic mechanism and the gastroduodenal pathogenesis induced by H. pylori infection.
Identification of gene expression profiles of cancer stem cells may have significant implications in the understanding of tumor biology and for the design of novel treatments targeted toward these cells. Here we report a potential ovarian cancer stem cell gene expression profile from isolated side population of fresh ascites obtained from women with high-grade advanced stage papillary serous ovarian adenocarcinoma. Affymetrix U133 Plus 2.0 microarrays were used to interrogate the differentially expressed genes between side population (SP) and main population (MP), and the results were analyzed by paired T-test using BRB-ArrayTools. We identified 138 up-regulated and 302 down-regulated genes that were differentially expressed between all 10 SP/MP pairs. Microarray data was validated using qRT-PCR and17/19 (89.5%) genes showed robust correlations between microarray and qRT-PCR expression data. The Pathway Studio analysis identified several genes involved in cell survival, differentiation, proliferation, and apoptosis which are unique to SP cells and a mechanism for the activation of Notch signaling is identified. To validate these findings, we have identified and isolated SP cells enriched for cancer stem cells from human ovarian cancer cell lines. The SP populations were having a higher colony forming efficiency in comparison to its MP counterpart and also capable of sustained expansion and differentiation in to SP and MP phenotypes. 50,000 SP cells produced tumor in nude mice whereas the same number of MP cells failed to give any tumor at 8 weeks after injection. The SP cells demonstrated a dose dependent sensitivity to specific γ-secretase inhibitors implicating the role of Notch signaling pathway in SP cell survival. Further the generated SP gene list was found to be enriched in recurrent ovarian cancer tumors.
Krüppel-like factor KLF4 plays a crucial role in the development and maintenance of the mouse cornea. Here, we have compared the wild type (WT) and Klf4-conditional null (Klf4CN) corneal gene expression patterns to understand the molecular basis of the Klf4CN corneal phenotype.
Expression of more than 22,000 genes in 10 WT and Klf4CN corneas was compared by microarrays, analyzed using BRB ArrayTools and validated by Q-RT-PCR. Transient cotransfections were employed to test if KLF4 activates the aquaporin-3, Aldh3a1 and TKT promoters.
Scatter plot analysis identified 740 and 529 genes up- and down-regulated by more than 2-fold, respectively, in the Klf4CN corneas. Cell cycle activators were upregulated while the inhibitors were downregulated, consistent with the increased Klf4CN corneal epithelial cell proliferation. Desmosomal components were downregulated, consistent with the Klf4CN corneal epithelial fragility. Downregulation of aquaporin-3, detected by microarray, was confirmed by immunoblot and immunohistochemistry. Aquaporin-3 promoter activity was stimulated 7–10 fold by cotransfection with pCI-KLF4. Corneal crystallins Aldh3A1 and TKT were downregulated in the Klf4CN cornea and their respective promoter activities were upregulated 16- and 9-fold by pCI-KLF4 in co-transfections. Expression of epidermal keratinocyte differentiation markers was affected in the Klf4CN cornea. While the cornea specific keratin-12 was downregulated, most other keratins were upregulated, suggesting hyperkeratosis.
We have identified functionally diverse candidate KLF4 target genes, revealing the molecular basis of the diverse aspects of the Klf4CN corneal phenotype. These results establish KLF4 as an important node in the genetic network of transcription factors regulating the corneal homeostasis.
Cornea; Development; KLF4; Microarray
DAPfinder and DAPview are novel BRB-ArrayTools plug-ins to construct gene coexpression networks and identify significant differences in pairwise gene-gene coexpression between two phenotypes.
Each significant difference in gene-gene association represents a Differentially Associated Pair (DAP). Our tools include several choices of filtering methods, gene-gene association metrics, statistical testing methods and multiple comparison adjustments. Network results are easily displayed in Cytoscape. Analyses of glioma experiments and microarray simulations demonstrate the utility of these tools.
DAPfinder is a new friendly-user tool for reconstruction and comparison of biological networks.
Numerous microarray analysis programs have been created through the efforts of Open Source software development projects. Providing browser-based interfaces that allow these programs to be executed over the Internet enhances the applicability and utility of these analytic software tools.
Here we present ArrayQuest, a web-based DNA microarray analysis process controller. Key features of ArrayQuest are that (1) it is capable of executing numerous analysis programs such as those written in R, BioPerl and C++; (2) new analysis programs can be added to ArrayQuest Methods Library at the request of users or developers; (3) input DNA microarray data can be selected from public databases (i.e., the Medical University of South Carolina (MUSC) DNA Microarray Database or Gene Expression Omnibus (GEO)) or it can be uploaded to the ArrayQuest center-point web server into a password-protected area; and (4) analysis jobs are distributed across computers configured in a backend cluster. To demonstrate the utility of ArrayQuest we have populated the methods library with methods for analysis of Affymetrix DNA microarray data.
ArrayQuest enables browser-based implementation of DNA microarray data analysis programs that can be executed on a Linux-based platform. Importantly, ArrayQuest is a platform that will facilitate the distribution and implementation of new analysis algorithms and is therefore of use to both developers of analysis applications as well as users. ArrayQuest is freely available for use at .
Clear cell ovarian cancer is an epithelial ovarian cancer histotype that is less responsive to chemotherapy and carries poorer prognosis than serous and endometrioid histotypes. Despite this, patients with these tumors are treated in a similar fashion as all other ovarian cancers. Previous genomic analysis has suggested that clear cell cancers represent a unique tumor subtype. Here we generated the first whole genomic expression profiling using epithelial component of clear cell ovarian cancers and normal ovarian surface specimens isolated by laser capture microdissection. All the arrays were analyzed using BRB ArrayTools and PathwayStudio software to identify the signaling pathways. Identified pathways validated using serous, clear cell cancer cell lines and RNAi technology. In vivo validations carried out using an orthotopic mouse model and liposomal encapsulated siRNA. Patient-derived clear cell and serous ovarian tumors were grafted under the renal capsule of NOD-SCID mice to evaluate the therapeutic potential of the identified pathway. We identified major activated pathways in clear cells involving in hypoxic cell growth, angiogenesis, and glucose metabolism not seen in other histotypes. Knockdown of key genes in these pathways sensitized clear cell ovarian cancer cell lines to hypoxia/glucose deprivation. In vivo experiments using patient derived tumors demonstrate that clear cell tumors are exquisitely sensitive to antiangiogenesis therapy (i.e. sunitinib) compared with serous tumors. We generated a histotype specific, gene signature associated with clear cell ovarian cancer which identifies important activated pathways critical for their clinicopathologic characteristics. These results provide a rational basis for a radically different treatment for ovarian clear cell patients.
Idiopathic Pulmonary Fibrosis (IPF) is characterized by profound changes in the lung phenotype including excessive extracellular matrix deposition, myofibroblast foci, alveolar epithelial cell hyperplasia and extensive remodeling. The role of epigenetic changes in determining the lung phenotype in IPF is unknown. In this study we determine whether IPF lungs exhibit an altered global methylation profile.
Immunoprecipitated methylated DNA from 12 IPF lungs, 10 lung adenocarcinomas and 10 normal histology lungs was hybridized to Agilent human CpG Islands Microarrays and data analysis was performed using BRB-Array Tools and DAVID Bioinformatics Resources software packages. Array results were validated using the EpiTYPER MassARRAY platform for 3 CpG islands. 625 CpG islands were differentially methylated between IPF and control lungs with an estimated False Discovery Rate less than 5%. The genes associated with the differentially methylated CpG islands are involved in regulation of apoptosis, morphogenesis and cellular biosynthetic processes. The expression of three genes (STK17B, STK3 and HIST1H2AH) with hypomethylated promoters was increased in IPF lungs. Comparison of IPF methylation patterns to lung cancer or control samples, revealed that IPF lungs display an intermediate methylation profile, partly similar to lung cancer and partly similar to control with 402 differentially methylated CpG islands overlapping between IPF and cancer. Despite their similarity to cancer, IPF lungs did not exhibit hypomethylation of long interspersed nuclear element 1 (LINE-1) retrotransposon while lung cancer samples did, suggesting that the global hypomethylation observed in cancer was not typical of IPF.
Our results provide evidence that epigenetic changes in IPF are widespread and potentially important. The partial similarity to cancer may signify similar pathogenetic mechanisms while the differences constitute IPF or cancer specific changes. Elucidating the role of these specific changes will potentially allow better understanding of the pathogenesis of IPF.
The high-density oligonucleotide microarray (GeneChip) is an important tool for molecular biological research aiming at large-scale detection of small nucleotide polymorphisms in DNA and genome-wide analysis of mRNA concentrations. Local array data management solutions are instrumental for efficient processing of the results and for subsequent uploading of data and annotations to a global certified data repository at the EBI (ArrayExpress) or the NCBI (GeneOmnibus).
To facilitate and accelerate annotation of high-throughput expression profiling experiments, the Microarray Information Management and Annotation System (MIMAS) was developed. The system is fully compliant with the Minimal Information About a Microarray Experiment (MIAME) convention. MIMAS provides life scientists with a highly flexible and focused GeneChip data storage and annotation platform essential for subsequent analysis and interpretation of experimental results with clustering and mining tools. The system software can be downloaded for academic use upon request.
MIMAS implements a novel concept for nation-wide GeneChip data management whereby a network of facilities is centered on one data node directly connected to the European certified public microarray data repository located at the EBI. The solution proposed may serve as a prototype approach to array data management between research institutes organized in a consortium.
The web application D-Maps provides a user-friendly interface to researchers performing studies based on microarrays. The program was developed to manage and process one- or two-color microarray data obtained from several platforms (currently, GeneTAC, ScanArray, CodeLink, NimbleGen and Affymetrix). Despite the availability of many algorithms and many software programs designed to perform microarray analysis on the internet, these usually require sophisticated knowledge of mathematics, statistics and computation. D-maps was developed to overcome the requirement of high performance computers or programming experience. D-Maps performs raw data processing, normalization and statistical analysis, allowing access to the analyzed data in text or graphical format. An original feature presented by D-Maps is GEO (Gene Expression Omnibus) submission format service. The D-MaPs application was already used for analysis of oligonucleotide microarrays and PCR-spotted arrays (one- and two-color, laser and light scanner). In conclusion, D-Maps is a valuable tool for microarray research community, especially in the case of groups without a bioinformatic core.
microarray; web service; software; affymetrix and nimblegen
Cervical cancer is the most common cancer among Indian women. The current recommendations are to treat the stage IIB, IIIA, IIIB and IVA with radical radiotherapy and weekly cisplatin based chemotherapy. However, Radiotherapy alone can help cure more than 60% of stage IIB and up to 40% of stage IIIB patients.
Archival RNA samples from 15 patients who had achieved complete remission and stayed disease free for more than 36 months (No Evidence of Disease or NED group) and 10 patients who had failed radical radiotherapy (Failed group) were included in the study. The RNA were amplified, labelled and hybridized to Stanford microarray chips and analyzed using BRB Array Tools software and Significance Analysis of Microarray (SAM) analysis. 20 genes were selected for further validation using Relative Quantitation (RQ) Taqman assay in a Taqman Low-Density Array (TLDA) format. The RQ value was calculated, using each of the NED sample once as a calibrator. A scoring system was developed based on the RQ value for the genes.
Using a seven gene based scoring system, it was possible to distinguish between the tumours which were likely to respond to the radiotherapy and those likely to fail. The mean score ± 2 SE (standard error of mean) was used and at a cut-off score of greater than 5.60, the sensitivity, specificity, Positive predictive value (PPV) and Negative predictive value (NPV) were 0.64, 1.0, 1.0, 0.67, respectively, for the low risk group.
We have identified a 7 gene signature which could help identify patients with cervical cancer who can be treated with radiotherapy alone. However, this needs to be validated in a larger patient population.
Many efforts in microarray data analysis are focused on providing tools and methods for the qualitative analysis of microarray data. HDBStat! (High-Dimensional Biology-Statistics) is a software package designed for analysis of high dimensional biology data such as microarray data. It was initially developed for the analysis of microarray gene expression data, but it can also be used for some applications in proteomics and other aspects of genomics. HDBStat! provides statisticians and biologists a flexible and easy-to-use interface to analyze complex microarray data using a variety of methods for data preprocessing, quality control analysis and hypothesis testing.
Results generated from data preprocessing methods, quality control analysis and hypothesis testing methods are output in the form of Excel CSV tables, graphs and an Html report summarizing data analysis.
HDBStat! is a platform-independent software that is freely available to academic institutions and non-profit organizations. It can be downloaded from our website .
The major public microarray repositories Gene Expression Omnibus and ArrayExpress are growing rapidly. This enables meta-analysis studies, in which expression data from multiple individual studies are combined. To facilitate these types of studies, we developed Microarray Retriever for searching and retrieval of data from GEO and ArrayExpress. The tool allows access to the two repositories simultaneously, to search in the repositories using complex queries, to retrieve microarray data for published articles and to download data in one structured archive. The tool is available on the web at: http://www.lgtc.nl/MaRe/
The use of DNA microarrays and oligonucleotide chips of high density in modern biomedical research provides complex, high dimensional data which have been proven to convey crucial information about gene expression levels and to play an important role in disease diagnosis. Therefore, there is a need for developing new, robust statistical techniques to analyze these data.
depthTools is an R package for a robust statistical analysis of gene expression data, based on an efficient implementation of a feasible notion of depth, the Modified Band Depth. This software includes several visualization and inference tools successfully applied to high dimensional gene expression data. A user-friendly interface is also provided via an R-commander plugin.
We illustrate the utility of the depthTools package, that could be used, for instance, to achieve a better understanding of genome-level variation between tumors and to facilitate the development of personalized treatments.
Data depth; Robustness; R package; R commander plug-in
The increasing number of methodologies and tools currently available to analyse gene expression microarray data can be confusing for non specialist users.
Based on the experience of biostatisticians of Institut Curie, we propose both a clear analysis strategy and a selection of tools to investigate microarray gene expression data. The most usual and relevant existing R functions were discussed, validated and gathered in an easy-to-use R package (EMA) devoted to gene expression microarray analysis. These functions were improved for ease of use, enhanced visualisation and better interpretation of results.
Strategy and tools proposed in the EMA R package could provide a useful starting point for many microarrays users. EMA is part of Comprehensive R Archive Network and is freely available at http://bioinfo.curie.fr/projects/ema/.
There are several isolated tools for partial analysis of microarray expression data. To provide an integrative, easy-to-use and automated toolkit for the analysis of Affymetrix microarray expression data we have developed Array2BIO, an application that couples several analytical methods into a single web based utility.
Array2BIO converts raw intensities into probe expression values, automatically maps those to genes, and subsequently identifies groups of co-expressed genes using two complementary approaches: (1) comparative analysis of signal versus control and (2) clustering analysis of gene expression across different conditions. The identified genes are assigned to functional categories based on Gene Ontology classification and KEGG protein interaction pathways. Array2BIO reliably handles low-expressor genes and provides a set of statistical methods for quantifying expression levels, including Benjamini-Hochberg and Bonferroni multiple testing corrections. An automated interface with the ECR Browser provides evolutionary conservation analysis for the identified gene loci while the interconnection with Crème allows prediction of gene regulatory elements that underlie observed expression patterns.
We have developed Array2BIO – a web based tool for rapid comprehensive analysis of Affymetrix microarray expression data, which also allows users to link expression data to Dcode.org comparative genomics tools and integrates a system for translating co-expression data into mechanisms of gene co-regulation. Array2BIO is publicly available at
Gene expression studies greatly contribute to our understanding of complex relationships in gene regulatory networks. However, the complexity of array design, production and manipulations are limiting factors, affecting data quality. The use of customized DNA microarrays improves overall data quality in many situations, however, only if for these specifically designed microarrays analysis tools are available.
The IronChip Evaluation Package (ICEP) is a collection of Perl utilities and an easy to use data evaluation pipeline for the analysis of microarray data with a focus on data quality of custom-designed microarrays. The package has been developed for the statistical and bioinformatical analysis of the custom cDNA microarray IronChip but can be easily adapted for other cDNA or oligonucleotide-based designed microarray platforms. ICEP uses decision tree-based algorithms to assign quality flags and performs robust analysis based on chip design properties regarding multiple repetitions, ratio cut-off, background and negative controls.
ICEP is a stand-alone Windows application to obtain optimal data quality from custom-designed microarrays and is freely available here (see "Additional Files" section) and at: http://www.alice-dsl.net/evgeniy.vainshtein/ICEP/
Summary: Microarrays are commonly used to detect changes in gene expression between different biological samples. For this purpose, many analysis tools have been developed that offer visualization, statistical analysis and more sophisticated analysis methods. Most of these tools are designed specifically for messenger RNA microarrays. However, today, more and more different microarray platforms are available. Changes in DNA methylation, microRNA expression or even protein phosphorylation states can be detected with specialized arrays. For these microarray technologies, the number of available tools is small compared with mRNA analysis tools. Especially, a joint analysis of different microarray platforms that have been used on the same set of biological samples is hardly supported by most microarray analysis tools. Here, we present InCroMAP, a tool for the analysis and visualization of high-level microarray data from individual or multiple different platforms. Currently, InCroMAP supports mRNA, microRNA, DNA methylation and protein modification datasets. Several methods are offered that allow for an integrated analysis of data from those platforms. The available features of InCroMAP range from visualization of DNA methylation data over annotation of microRNA targets and integrated gene set enrichment analysis to a joint visualization of data from all platforms in the context of metabolic or signalling pathways.
Availability: InCroMAP is freely available as Java™ application at www.cogsys.cs.uni-tuebingen.de/software/InCroMAP, including a comprehensive user’s guide and example files.
firstname.lastname@example.org or email@example.com
DNA microarrays provide data for genome wide patterns of expression between observation classes. Microarray studies often have small samples sizes, however, due to cost constraints or specimen availability. This can lead to poor random error estimates and inaccurate statistical tests of differential expression. We compare the performance of the standard t-test, fold change, and four small n statistical test methods designed to circumvent these problems. We report results of various normalization methods for empirical microarray data and of various random error models for simulated data.
Three Empirical Bayes methods (CyberT, BRB, and limma t-statistics) were the most effective statistical tests across simulated and both 2-colour cDNA and Affymetrix experimental data. The CyberT regularized t-statistic in particular was able to maintain expected false positive rates with simulated data showing high variances at low gene intensities, although at the cost of low true positive rates. The Local Pooled Error (LPE) test introduced a bias that lowered false positive rates below theoretically expected values and had lower power relative to the top performers. The standard two-sample t-test and fold change were also found to be sub-optimal for detecting differentially expressed genes. The generalized log transformation was shown to be beneficial in improving results with certain data sets, in particular high variance cDNA data.
Pre-processing of data influences performance and the proper combination of pre-processing and statistical testing is necessary for obtaining the best results. All three Empirical Bayes methods assessed in our study are good choices for statistical tests for small n microarray studies for both Affymetrix and cDNA data. Choice of method for a particular study will depend on software and normalization preferences.
The ArrayExpress Archive of Functional Genomics Data (http://www.ebi.ac.uk/arrayexpress) is one of three international functional genomics public data repositories, alongside the Gene Expression Omnibus at NCBI and the DDBJ Omics Archive, supporting peer-reviewed publications. It accepts data generated by sequencing or array-based technologies and currently contains data from almost a million assays, from over 30 000 experiments. The proportion of sequencing-based submissions has grown significantly over the last 2 years and has reached, in 2012, 15% of all new data. All data are available from ArrayExpress in MAGE-TAB format, which allows robust linking to data analysis and visualization tools, including Bioconductor and GenomeSpace. Additionally, R objects, for microarray data, and binary alignment format files, for sequencing data, have been generated for a significant proportion of ArrayExpress data.
The aim of this study was to identify key genes and novel potential therapeutic targets related to gastric cancer (GC) by comparing cancer tissue samples and healthy control samples using DNA microarray analysis.
Microarray data set GSE19804 was downloaded from Gene Expression Omnibus. Preprocessing and differential analysis were conducted with of R statistical software packages, and a number of differentially expressed genes (DEGs) were obtained. Cluster analysis was also done with gene expression values. Functional enrichment analysis was performed for all the DEGs with DAVID tools. The significantly up- and downregulated genes were selected out and their interactors were retrieved with STRING and HitPredict, followed by construction of networks. For all the genes in the two networks, GeneCodis was chosen for gene function annotation.
A total of 638 DEGs were identified, and we found that SPP1 and FABP4 were the markedly up- and downregulated genes, respectively. Cell cycle and regulation of proliferation were the most significantly overrepresented functional terms in up- and downregulated genes. In addition, extracellular matrix–receptor interaction was found to be significant in the SPP1-included interaction network.
A range of DEGs were obtained for GC. These genes not only provided insights into the pathogenesis of GC but also could develop into biomarkers for diagnosis or treatment.
Differentially expressed gene; Functional enrichment analysis; Gastric cancer; Interaction network; Pathway analysis
The incorporation of statistical models that account for experimental variability provides a necessary framework for the interpretation of microarray data. A robust experimental design coupled with an analysis of variance (ANOVA) incorporating a model that accounts for known sources of experimental variability can significantly improve the determination of differences in gene expression and estimations of their significance.
To realize the full benefits of performing analysis of variance on microarray data we have developed CARMA, a microarray analysis platform that reads data files generated by most microarray image processing software packages, performs ANOVA using a user-defined linear model, and produces easily interpretable graphical and numeric results. No pre-processing of the data is required and user-specified parameters control most aspects of the analysis including statistical significance criterion. The software also performs location and intensity dependent lowess normalization, automatic outlier detection and removal, and accommodates missing data.
CARMA provides a clear quantitative and statistical characterization of each measured gene that can be used to assess marginally acceptable measures and improve confidence in the interpretation of microarray results. Overall, applying CARMA to microarray datasets incorporating repeated measures effectively reduces the number of gene incorrectly identified as differentially expressed and results in a more robust and reliable analysis.
Many cutting-edge microarray analysis tools and algorithms, including commonly used limma and affy packages in Bioconductor, need sophisticated knowledge of mathematics, statistics and computer skills for implementation. Commercially available software can provide a user-friendly interface at considerable cost. To facilitate the use of these tools for microarray data analysis on an open platform we developed an online microarray data analysis platform, WebArray, for bench biologists to utilize these tools to explore data from single/dual color microarray experiments.
The currently implemented functions were based on limma and affy package from Bioconductor, the spacings LOESS histogram (SPLOSH) method, PCA-assisted normalization method and genome mapping method. WebArray incorporates these packages and provides a user-friendly interface for accessing a wide range of key functions of limma and others, such as spot quality weight, background correction, graphical plotting, normalization, linear modeling, empirical bayes statistical analysis, false discovery rate (FDR) estimation, chromosomal mapping for genome comparison.
WebArray offers a convenient platform for bench biologists to access several cutting-edge microarray data analysis tools. The website is freely available at . It runs on a Linux server with Apache and MySQL.
The Remote Analysis Computation for gene Expression data (RACE) suite is a collection of bioinformatics web tools designed for the analysis of DNA microarray data. RACE performs probe-level data preprocessing, extensive quality checks, data visualization and data normalization for Affymetrix GeneChips. In addition, it offers differential expression analysis on normalized expression levels from any array platform. RACE estimates the false discovery rates of lists of potentially regulated genes and provides a Gene Ontology-term analysis tool for GeneChip data to support the biological interpretation and annotation of results. The analysis is fully automated but can be customized by flexible parameter settings. To offer a convenient starting point for subsequent analyses, and to provide maximum transparency, the R scripts used to generate the results can be downloaded along with the output files. RACE is freely available for use at .
Analysis of DNA microarray data takes as input spot intensity measurements from scanner software and returns differential expression of genes between two conditions, together with a statistical significance assessment. This process typically consists of two steps: data normalization and identification of differentially expressed genes through statistical analysis. The Expresso microarray experiment management system implements these steps with a two-stage, log-linear ANOVA mixed model technique, tailored to individual experimental designs. The complement of tools in TM4, on the other hand, is based on a number of preset design choices that limit its flexibility. In the TM4 microarray analysis suite, normalization, filter, and analysis methods form an analysis pipeline. TM4 computes integrated intensity values (IIV) from the average intensities and spot pixel counts returned by the scanner software as input to its normalization steps. By contrast, Expresso can use either IIV data or median intensity values (MIV). Here, we compare Expresso and TM4 analysis of two experiments and assess the results against qRT-PCR data.
The Expresso analysis using MIV data consistently identifies more genes as differentially expressed, when compared to Expresso analysis with IIV data. The typical TM4 normalization and filtering pipeline corrects systematic intensity-specific bias on a per microarray basis. Subsequent statistical analysis with Expresso or a TM4 t-test can effectively identify differentially expressed genes. The best agreement with qRT-PCR data is obtained through the use of Expresso analysis and MIV data.
The results of this research are of practical value to biologists who analyze microarray data sets. The TM4 normalization and filtering pipeline corrects microarray-specific systematic bias and complements the normalization stage in Expresso analysis. The results of Expresso using MIV data have the best agreement with qRT-PCR results. In one experiment, MIV is a better choice than IIV as input to data normalization and statistical analysis methods, as it yields as greater number of statistically significant differentially expressed genes; TM4 does not support the choice of MIV input data. Overall, the more flexible and extensive statistical models of Expresso achieve more accurate analytical results, when judged by the yardstick of qRT-PCR data, in the context of an experimental design of modest complexity.