Microarray analysis using WebArray can be executed in three steps: 1) uploading and managing files; 2) selecting datasets and methods for analysis; 3) browsing results. Partial web page for dual color array data analysis is shown in Figure . A help document is available online with detailed annotation of all functions of WebArray.
Snapshot of WebArray web page for dual color array data analysis.
As the first step for analysis, users need to upload their microarray intensity files, gene list file and others. The files will be deleted from the server six months after submitting. Users can view and manually delete these files, as well. WebArray requires the following files for analysis; 1) Intensity files. Text files exported by a variety of image analysis programs such as Affymetrix, Agilent, ArrayVision, Genepix, ImaGene, QuantArray, SMD and SPOT. Files exported from other programs have to be uploaded in a specified format; 2) Targets file. A tab-delimited text file listing the targets hybridized to each channel of each array; 3) Gene list file, such as gene allocation list (GAL) file. A specified format is acceptable too; 4) Design file. A tab-delimited text file containing design matrix for linear model; 5) Spot type file (STF). STF is used to distinguish different types of spots from the gene list using regular expression, including control spots, positive and negative controls; 6) Genome/chromosome location file. A tab-delimited text file containing array spots sorted by genome/chromosome location information; 7) control genes file. A text file containing housekeeping gene's printing order index for composite normalization. Intensity files are required for all analysis, a gene list file is required for dual color array data analysis and all other files are optional.
In the "submit requests" page, users can select data for analysis from their own uploaded files. WebArray includes most of the functions limma provided, such as spot quality weight, background subtraction, normalization and empirical Bayes statistical analysis. In addition, principal component analysis assisted normalization method is incorporated [7
], FDR can be estimated using SPLOSH, and chromosomal mapping will be plotted if desired.
The limma package uses linear models to analyze designed microarray experiments. For Affymetrix array data and simple dual color experiments, such as two-sample comparison with switching dye or two-sample comparison with common reference, users can specify the design just by selecting sample types in the columns corresponding to each microarray intensity file. For multi-sample comparisons or complicated experiment design, users need more statistical knowledge for the creation of design matrix and contrast matrix.
WebArray allows the user to name a request, otherwise a name will be assigned automatically. Submitted requests will be put on a waiting list. The page for results allows users to browse their own list of requests. Requests can be edited or removed. Since computation with microarray data usually involves huge data sets, it may take a few minutes to complete a computation. For data sets with within-array duplicates the process will take much longer time, maybe hours.
The output files include tab-delimited text files and graphic plots that can be downloaded separately or viewed online. All the files are archived in one ZIP file and are available for download as well. Based on the options of analysis, the text file may contain the gene information, M (log2 ratio), moderated t and its corresponding p-value, B statistics, FDR, FP (false positive), FN (false negative) and CGH fitted value. The table of genes can be either unranked or ranked by M, p value or B statistics. Graphic plots include array image plots, density plots, histogram, RNA degradation plot, M-A plot for each array (before and after within-array normalization), printtiploess plot for each array, box plot (before and after between-array normalization), chromosomal location mapping plots where M is plotted against chromosomal location and results plot that includes M-A plot, M-B plot, M histogram and B statistics histogram (see Figure ).
Figure 4 Statistical analysis result plot. Result plot includes M-A plot, M-B plot, M histogram and B statistics histogram. M: the log-differential expression ratio. A: the log-intensity of the spot, a measure of overall brightness of the spot. B: B statistics, (more ...)
While more sophisticated programs are available commercially, WebArray represents an excellent free open source software for microarray analysis that can be used by an average biologist after moderate training. To help biologists to understand the underlying statistics methods, we provide detailed explanations and references for most WebArray functions in the help document.