EzArray is a web-based Affymetrix expression array data management and analysis system implemented in an open source environment. Since the same technologies are often used to build database-powered websites, EzArray can be easily integrated with users' existing websites. In summary, EzArray takes advantages of modern web technologies, provides multiple user support, has group-based data sharing capabilities, contains tools for highly automated data analysis, and has user-friendly interfaces. These features distinguish EzArray from most other standalone and web-based microarray programs.
Most microarray data analysis tools have been implemented as Bioconductor R packages that run from the command line or have simple point-and-click graphic interfaces. Both R packages limma
] and affy
offer R users a command-line interface to state-of-the-art microarray data analysis techniques. The R packages affylmGUI
] and webbioc
offer simple point-and-click interfaces to many of the limma
functions. It seems these programs simply analyze data instead of providing comprehensive data management capabilities.
Recently, more and more web-based microarray systems have been developed. MAGMA [4
] is a Java-based web application that provides a simple and intuitive interface to identify differentially expressed genes from two-channel microarray data. MAGMA does not support databases, and the results are file-based. Though MAGMA provides for each user a separate workspace for storing and analyzing microarray data, MAGMA lacks tools for data sharing among users. Similar to EzArray, MAGMA automatically generates R-scripts that document the entire data processing steps. However, EzArray takes it further by allowing the users to download all input and output files together with R-scripts. This guarantees the user to regenerate all results in his local R installation. In terms of data analysis, MAGMA does not contain the gene annotation step and the results are tab-delimited text files and graphic plot files. The RepA program in EzArray generates HTML webpages with hyperlinks to public life science databases. In addition, compared to EzArray, MAGMA does not include algorithms to automatically select data analysis methods and parameters, and therefore, the analysis process is less automated. GEPAS [5
] has been designed to provide an intuitive web-based interface that offers diverse analysis options from data preprocessing to gene selection, gene clustering, gene annotation, and more. Instead of taking advantages of existing R and Bioconductor packages, GEPAS has incorporated many newly developed programs written in 'C' languages. The web interfaces of GEPAS are Perl CGIs. The most recent version of GEPAS (v4.0) has included very simple tools for user registration as well as data file browsing. In addition, due to the abundance of novel programs and low level of automation in data analysis, using GEPAS requires in-depth knowledge of the system and many microarray data analysis algorithms. Asterias [6
] is an open source and web-based suite for the analysis of gene expression and aCGH data. Asterias is the only web-based application that uses parallel computing. Asterias also takes advantages of many R and Bioconductor packages including limma
. The web interfaces of Asterias are mostly written in Python. Though a few applications in Asterias support MySQL database, Asterias does not contain any tools for user or data management. The input data to all applications are plain text files that are uploaded "on the fly" during analysis. The web application CARMAweb [3
] was implemented in Java based on J2EE (Java 2 Enterprise Edition) software technology. It supports Affymetrix GeneChips, spotted two-color microarrays and Applied Biosystems (ABI) microarrays. CARMAweb has a simple user management tool that guarantees password protected access to the user's data and analysis results. All user data are stored as files in the user data directory. Currently, CARMAweb does not support databases and group-based data sharing. WebArray [10
] is another microarray system implemented with technologies similar to those used in EzArray (WebArray used Python instead of PHP programming language). WebArray provides a user-friendly interface for accessing a wide range of key functions of limma
and other Bioconductor packages. WebArray is an excellent free open source software system for microarray analysis that can be used by an average biologist after moderate training. Nevertheless, WebArray has limited capabilities in data management and data sharing. WebArray is not project-oriented and all data are stored as files in one user data directory. Though WebArray allows users to download output files (tab-delimited text files and graphic plots), it does not allow downloading of executed R scripts. When compared to these web microarray systems, EzArray features much more intuitive user interfaces, more powerful data management capabilities, and significantly higher levels of automation in the analysis processes.
EzArray was designed to be operating system-independent due to the cross-platform features of Apache and PHP. EzArray is also expected to be database platform-independent due to the adoption of a database abstraction library ADOdb [26
] that supports most SQL-based databases. This provides the flexibility for end users to select convenient operating systems and database servers. So far, we have fully tested EzArray on the Linux operating system (Fedora 7) with MySQL database, and we are planning to test EzArray on other operating systems with various databases.
The current version of EzArray stores only minimal experimental information. We are planning to develop new database tables and corresponding web interfaces for storing MIAME [27
]-compliant microarray data.
Due to the modular structures and open source features of EzArray, extensions or new functionalities can be rapidly implemented on top of EzArray. We have already started designing web-based tools for analyzing Agilent and Nimblegen microarray data. Even with Affymetrix expression data, our analysis procedures can be further improved. For example, for data with two sample groups and just a few replicates per group, the current version of EzArray simply uses Fold Changes to select differentially expression genes. In next EzArray version, we plan to enhance the data analysis procedures with more established algorithms and programs, such as limma
, SAM [28
], and EBArrays [31