During the past decade, many microarray software packages have been developed for data analysis and visualization of gene expression, comparative hybridization, and tiling microarrays. Among these packages are Cluster/TreeView, GenePattern, GenMAPP, BRB ArrayTools, GSEA, SAM, TM4, GeneSpring and dChip. In addition, Bioconductor [
1] builds on the R environment to provide programming capabilities and libraries for analyzing microarray and genomics data. These software packages have greatly contributed to translating large, raw datasets into testable biological hypotheses. In particular, we have developed and maintained dChip as a microarray software package accessible to both biologists and data analysts through a friendly user interface [
2,
3]. dChip has been widely used for expression and SNP (single nucleotide polymorphism) microarray data analysis, due to its many data-processing functions and interactive exploration views for probe-level, clustering, and chromosome-level data.
From our experiences with dChip software development, user support, and data analysis, we noticed several related issues. First, users or microarray core consultants often share their dChip analysis procedures with colleagues. However, the second user must manually follow the exact menu steps and analysis parameters to achieve the same results or see the same data view as the first user. This process is time-consuming and error-prone. Second, when dChip analysis errors or potential bugs occur, the information provided by users is often incomplete and out of context. To provide solutions or debug code, we require more information, which often is obtained through iterations of messages before we are able to recreate the error scenario using our own data. Third, routine dChip operations on new datasets could better be automated with minimal user intervention.
We report here the implementation a dChip automation module to meet these challenges. Using this module, dChip automation files can be interactively created when a user performs data analyses. These files contain menu steps with their parameters stored in a set of parameter files, as well as viewpoints that record particular positions and image sizes in clustering or chromosome views. A data-packaging function can let one user conveniently send microarray data, automation parameter files, and the dChip executable file to a recipient. On the recipient's computer, dChip can automatically follow the exact menu steps and parameters to recreate the entire analysis session, such as generating an analysis output or going to particular chromosome positions. An analysis report file can also be generated during an automated run, including an analysis log, user comments, and viewpoint screenshots.
There are other automation approaches in software packages or web services for genomics analysis. Taverna is a software tool for designing and executing workflows [
4], and it allows user to integrate various tools or web services such as NCBI or Bioconductor. Taverna conveniently organizes and visualizes analysis steps in a flow chart, and it also provides a repository of workflows that can be shared among users. Using the R statistical environment, Gentleman and colleagues have proposed reproducible microarray data analysis via a compendium consisting of codes, data, and manuscript texts [
5]. In addition, microarray and proteomics data standards such as MIAME, MAGE and MIAPE have greatly facilitated sharing of datasets among researchers using different software packages. Comparing to these tools, the dChip automation module has a minimal learning curve (the existing dChip users can easily automate and share their analysis), does not require user programming, and emphasizes the reproducible sharing of analysis data and procedures among dChip users. Through the dChip automation module, we also intend to illustrate simple implementation principles that can be adopted by other genomics software packages, which when automated could be utilized in a workflow environment such as Taverna.