The web-based interface of the microarray analysis suite can be divided into three major sections: (i) data management, (ii) statistical analysis and (iii) interpretation. Throughout all three divisions is a consistent user-interface with multiple tools for exporting data (). To ease integration with third-party tools, results may be downloaded as tab-delimited text files or as XML documents.
Screenshots of the VAMPIRE web interface. The intuitive design allows users to easily manage data sets, perform VAMPIRE statistical analysis, and interpret gene expression results with GOby.
Management of microarray data sets and their corresponding analyses becomes increasingly complex as the numbers of treatment groups grow. Users must manage not only individual samples, but also all subsequent analyses. This is a particular problem for users with large data sets, who wish to begin interpreting gene expression data before all of samples have been completely processed. To accommodate these issues, we have created a data management system that can be quickly used to load, annotate and associate microarray measurements. Summary measures of gene expression, such as those obtained from Affymetrix MAS 5.0 or Agilent processed signal intensities, can be imported into the server as tab-delimited text files, a file format that is easily accessible to most biologists.
Once data have been loaded into the web application, the user may associate related microarray samples. Characterizing these relationships is crucial, as they help to describe the analyses that will be later performed. For example, in a two-channel tutorial provided on the web site, users create sample groups for (i) replicates of LPS-treated macrophages, (ii) replicates of control-treated macrophages and (iii) paired samples obtained from the same chips. These groups can be further combined to create ‘categories’ of related groups. Once these relationships are recorded, they can be used to compare gene expression across different treatment conditions. Since further changes to each sample group are recorded by the analysis suite, VAMPIRE can subsequently inform users when analyses need to be re-executed.
Statistical analysis by VAMPIRE requires two distinct steps: (i) modeling of the error structure of sample groups and (ii) significance testing with a priori-defined significance thresholds. This approach to microarray analysis is considerably different from the approaches taken by other analytical methods (3
). Normalization methods are commonly applied to average out the error structure prior to performing statistical analysis. Statistical tests are then left to address the significance of expression differences found in the remaining data. In contrast, VAMPIRE studies the underlying error structure, without perturbing it, and uses this knowledge to distinguish signal from noise. The cutoff for statistical significance is then defined by the significance threshold and by the magnitude of the variance model coefficients. Because of the additional variance modeling step however, this kind of analysis can be quite challenging without a robust accounting system. In the web application that we present here, users can easily keep track of all variance models and the data sets for which they can be applied. We have initially incorporated two variants of VAMPIRE—classical unpaired analysis (1
) and paired analysis (A. Hsiao and S. Subramaniam, manuscript submitted); both of which use variance models to detect significant changes in gene expression.
When users submit a request to model the variance structure of a group of samples, a new ‘processing job’ is immediately submitted into a processing queue. Individual jobs require an average of 5–10 min to compute on an Intel Xeon 3.06 GHz processor. In the meantime, users may continue to use the remainder of the site, without waiting for each job to complete. An estimate of the date and time of completion is prominently displayed. Similarly, users may request that specific statistical tests be performed. Since these tests rely on variance model results, they will not be executed until their dependent models have been completed. As data can be continually added to the analysis platform, outdated models and tests are automatically flagged by the system to allow users to re-execute analyses with updated data. This particular feature facilitates ‘on-the-fly’ analysis. Users can monitor the results as they collect data, which may help them to decide which analyses require additional replication.
Differentially regulated features obtained from any statistical test must be interpreted biologically. We have developed a novel tool, known as GOby, to initiate biological interpretation, independent of whether VAMPIRE itself was used to derive the feature lists. This database-driven application curates annotation data from several sources: National Center for Biotechnology Information (NCBI), GO, Kyoto Encyclopedia of Genes and Genomes (7
) (KEGG), TRANSFAC (8
), Biocarta and Superarray. In addition, it can be readily updated with additional user-defined annotation lists.
GOby primarily uses its annotation database to identify overrepresented annotation groups. It does so by comparing a ‘selected’ list to a ‘background’. In our experience, using the comprehensive feature list for each microarray as the background gives quite meaningful results. In a manner similar to other recently published tools (9
), GOby uses exact probabilities to compute enrichment likelihoods, and displays the enrichment likelihood as a P
-value. GOby reports as its P
-value, the probability of finding no more than s
features annotated with a given term among k
is the number of ‘background’ features annotated with the term; s
, the number of ‘selected’ features annotated with the term; N
, the total number of ‘background’ features; and k
, the total number of ‘selected’ features.
Unlike similar tools however, GOby is also able to compute a ‘conditional-enrichment likelihood’, or Q-value, for each term in a hierarchical annotation system. This Q-value is based on the idea that truly meaningful enrichment in a hierarchical system like GO will occur at specific nodes in the annotation tree. The Q-value computes the enrichment likelihood for a particular term conditioned on the enrichment of its parent terms. In other words, instead of using the entire set of array features as the ‘background’, we use only the subset of features that are annotated with one of the parent terms. Unlike the P-value previously described, the Q-value prevents annotation terms from reaching significance simply because they lie in an area of the tree near other terms that are enriched. It can therefore narrow the user's focus by reporting only the optimal level of functional detail while excluding both more general and more specific terms (unless these terms fall into a second area of functional enrichment independent of the first). Since both methods have their own advantages, both are displayed in the results.
Figure 3 GOby-rendered report pages. Three types of pages are automatically rendered by GOby for navigation of GOby results. The term table (A) displays annotation terms that are enriched among differentially expressed features. The term pages (B) show differentially (more ...)