Data visualization techniques for the pharmaceutical sciences have not been extensively investigated. The purpose of this study was to evaluate the usefulness of VizStruct, a multidimensional visualization tool, for applications in pharmacokinetics, pharmacodynamics, and pharmacogenomics.
The VizStruct tool uses the first harmonic of the discrete Fourier transform to map multidimensional data to two dimensions for visualization. The mapping was used to visualize several published pharmacokinetic, pharmacodynamic, and pharmacogenomic data sets. The VizStruct approach was evaluated using simulated population pharmacokinetics data sets, the data from Dalen and colleagues (Clin. Pharmacol. Ther. 63:444−452, 1998) on the kinetics of nortriptyline and its 10-hydroxy-nortriptyline metabolite in subjects with differing number of copies of the CYP2D6, and the gene expression profiling data of Bohen and colleagues (Proc. Natl. Acad. Sci. USA 100:1926−1930, 2003) on follicular lymphoma patients responsive and nonresponsive to rituximab.
The VizStruct mapping preserves the key characteristics of multidimensional data in two dimensions in a manner that facilitates visualization. The mapping is computationally efficient and can be used for cluster detection and class prediction in pharmaceutical data sets. The VizStruct visualization succinctly summarized the salient similarities and differences in the nortriptyline and 10-hydroxynortriptyline pharmacokinetic profiles in subjects with increasing number of CYP2D6 gene copies. In the simulated population pharmacokinetic data sets, it was capable of discriminating the subtle differences between pharmacokinetic profiles derived from 1- and 2-compartment models with the same area under the curve. The two-dimensional VizStruct mapping computed from a subset of 102 informative genes from the Bohen and colleagues data set effectively separated the rituximab responder, rituximab nonresponder, and control subject groups.
The VizStruct approach is a computationally efficient and effective approach for visualizing complex, multidimensional data sets. It could have many useful applications in the pharmaceutical sciences.
microarray; pharmacodynamics; pharmacogenomic modeling; pharmacokinetics; visualization algorithms
The size, dimensionality and the limited range of the data values makes visualization of single nucleotide polymorphism (SNP) datasets challenging. The purpose of this study is to evaluate the usefulness of 3D VizStruct, a novel multi-dimensional data visualization technique for SNP datasets capable of identifying informative SNPs in genome-wide association studies. VizStruct is an interactive visualization technique that reduces multi-dimensional data to three dimensions using a combination of the discrete Fourier transform and the Kullback–Leibler divergence. The performance of 3D VizStruct was challenged with several diverse, biologically relevant published datasets including the human lipoprotein lipase (LPL) gene locus, the human Y-chromosome in several populations and a multi-locus genotype dataset of coral samples from four populations. In every case, the SNPs and or polymorphic markers identified by the 3D VizStruct mapping were predictive of the underlying biology.
Micro array data provides information of expression levels of thousands of genes in a cell in a single experiment.
Numerous efforts have been made to use gene expression profiles to improve precision of tumor classification. In our present
study we have used the benchmark colon cancer data set for analysis. Feature selection is done using t‐statistic. Comparative
study of class prediction accuracy of 3 different classifiers viz., support vector machine (SVM), neural nets and logistic
regression was performed using the top 10 genes ranked by the t‐statistic. SVM turned out to be the best classifier for this
dataset based on area under the receiver operating characteristic curve (AUC) and total accuracy. Logistic Regression ranks
as the next best classifier followed by Multi Layer Perceptron (MLP). The top 10 genes selected by us for classification are all
well documented for their variable expression in colon cancer. We conclude that SVM together with t-statistic based feature
selection is an efficient and viable alternative to popular techniques.
gene expression; tumor classification; t-statistic; feature selection; SVM neural network; logistic regression
Microarray compendia profile the expression of genes in a number of experimental conditions. Such data compendia are useful not only to group genes and conditions based on their similarity in overall expression over profiles but also to gain information on more subtle relations between genes and conditions. Getting a clear visual overview of all these patterns in a single easy-to-grasp representation is a useful preliminary analysis step: We propose to use for this purpose an advanced exploratory method, called multidimensional unfolding.
We present a novel algorithm for multidimensional unfolding that overcomes both general problems and problems that are specific for the analysis of gene expression data sets. Applying the algorithm to two publicly available microarray compendia illustrates its power as a tool for exploratory data analysis: The unfolding analysis of a first data set resulted in a two-dimensional representation which clearly reveals temporal regulation patterns for the genes and a meaningful structure for the time points, while the analysis of a second data set showed the algorithm's ability to go beyond a mere identification of those genes that discriminate between different patient or tissue types.
Multidimensional unfolding offers a useful tool for preliminary explorations of microarray data: By relying on an easy-to-grasp low-dimensional geometric framework, relations among genes, among conditions and between genes and conditions are simultaneously represented in an accessible way which may reveal interesting patterns in the data. An additional advantage of the method is that it can be applied to the raw data without necessitating the choice of suitable genewise transformations of the data.
In translational cancer research, gene expression data is collected together with clinical data and genomic data arising from other chip based high throughput technologies. Software tools for the joint analysis of such high dimensional data sets together with clinical data are required.
We have developed an open source software tool which provides interactive visualization capability for the integrated analysis of high-dimensional gene expression data together with associated clinical data, array CGH data and SNP array data. The different data types are organized by a comprehensive data manager. Interactive tools are provided for all graphics: heatmaps, dendrograms, barcharts, histograms, eventcharts and a chromosome browser, which displays genetic variations along the genome. All graphics are dynamic and fully linked so that any object selected in a graphic will be highlighted in all other graphics. For exploratory data analysis the software provides unsupervised data analytics like clustering, seriation algorithms and biclustering algorithms.
The SEURAT software meets the growing needs of researchers to perform joint analysis of gene expression, genomical and clinical data.
Conventionally, the first step in analyzing the large and high-dimensional data sets measured by microarrays is visual exploration. Dendrograms of hierarchical clustering, self-organizing maps (SOMs), and multidimensional scaling have been used to visualize similarity relationships of data samples. We address two central properties of the methods: (i) Are the visualizations trustworthy, i.e., if two samples are visualized to be similar, are they really similar? (ii) The metric. The measure of similarity determines the result; we propose using a new learning metrics principle to derive a metric from interrelationships among data sets.
The trustworthiness of hierarchical clustering, multidimensional scaling, and the self-organizing map were compared in visualizing similarity relationships among gene expression profiles. The self-organizing map was the best except that hierarchical clustering was the most trustworthy for the most similar profiles. Trustworthiness can be further increased by treating separately those genes for which the visualization is least trustworthy. We then proceed to improve the metric. The distance measure between the expression profiles is adjusted to measure differences relevant to functional classes of the genes. The genes for which the new metric is the most different from the usual correlation metric are listed and visualized with one of the visualization methods, the self-organizing map, computed in the new metric.
The conjecture from the methodological results is that the self-organizing map can be recommended to complement the usual hierarchical clustering for visualizing and exploring gene expression data. Discarding the least trustworthy samples and improving the metric still improves it.
Information obtained by DNA microarray technology gives a rough snapshot of the transcriptome state, i.e., the expression level of all the genes expressed in a cell population at any given time. One of the challenging questions raised by the tremendous amount of microarray data is to identify groups of co-regulated genes and to understand their role in cell functions.
MiCoViTo (Microarray Comparison Visualization Tool) is a set of biologists' tools for exploring, comparing and visualizing changes in the yeast transcriptome by a gene-centric approach. A relational database includes data linked to genome expression and graphical output makes it easy to visualize clusters of co-expressed genes in the context of available biological information. To this aim, upload of personal data is possible and microarray data from fifty publications dedicated to S. cerevisiae are provided on-line. A web interface guides the biologist during the usage of this tool and is freely accessible at .
MiCoViTo offers an easy-to-read picture of local transcriptional changes connected to current biological knowledge. This should help biologists to mine yeast microarray data and better understand the underlying biology. We plan to add functional annotations from other organisms. That would allow inter-species comparison of transcriptomes via orthology tables.
microarray; functional categories; Saccharomyces cerevisiae; clustering
DNA arrays permit rapid, large-scale screening for patterns of gene expression and simultaneously yield the expression levels of thousands of genes for samples. The number of samples is usually limited, and such datasets are very sparse in high-dimensional gene space. Furthermore, most of the genes collected may not necessarily be of interest and uncertainty about which genes are relevant makes it difficult to construct an informative gene space. Unsupervised empirical sample pattern discovery and informative genes identification of such sparse high-dimensional datasets present interesting but challenging problems.
A new model called empirical sample pattern detection (ESPD) is proposed to delineate pattern quality with informative genes. By integrating statistical metrics, data mining and machine learning techniques, this model dynamically measures and manipulates the relationship between samples and genes while conducting an iterative detection of informative space and the empirical pattern. The performance of the proposed method with various array datasets is illustrated.
With the increasing application of various genomic technologies in biomedical research, there is a need to integrate these data to correlate candidate genes/regions that are identified by different genomic platforms. Although there are tools that can analyze data from individual platforms, essential software for integration of genomic data is still lacking. Here, we present a novel Java-based program called CGI (Cytogenetics-Genomics Integrator) that matches the BAC clones from array-based comparative genomic hybridization (aCGH) to genes from RNA expression profiling datasets. The matching is computed via a fast, backend MySQL database containing UCSC Genome Browser annotations. This program also provides an easy-to-use graphical user interface for visualizing and summarizing the correlation of DNA copy number changes and RNA expression patterns from a set of experiments. In addition, CGI uses a Java applet to display the copy number values of a specific BAC clone in aCGH experiments side by side with the expression levels of genes that are mapped back to that BAC clone from the microarray experiments. The CGI program is built on top of extensible, reusable graphic components specifically designed for biologists. It is cross-platform compatible and the source code is freely available under the General Public License.
aCGH; expression profiling; visualization; correlation; and data integration
A major challenge in biomedical studies in recent years has been the classification of gene expression profiles into categories, such as cases and controls. This is done by first training a classifier by using a labeled training set containing labeled samples from the two populations, and then using that classifier to predict the labels of new samples. Such predictions have recently been shown to improve the diagnosis and treatment selection practices for several diseases. This procedure is complicated, however, by the high dimensionality if the data. While microarrays can measure the levels of thousands of genes per sample, case-control microarray studies usually involve no more than several dozen samples. Standard classifiers do not work well in these situations where the number of features (gene expression levels measured in these microarrays) far exceeds the number of samples. Selecting only the features that are most relevant for discriminating between the two categories can help construct better classifiers, in terms of both accuracy and efficiency. In this work we developed a novel method for multivariate feature selection based on the Partial Least Squares algorithm. We compared the method's variants with common feature selection techniques across a large number of real case-control datasets, using several classifiers. We demonstrate the advantages of the method and the preferable combinations of classifier and feature selection technique.
Motivation: Bioimaging techniques rapidly develop toward higher resolution and dimension. The increase in dimension is achieved by different techniques such as multitag fluorescence imaging, Matrix Assisted Laser Desorption / Ionization (MALDI) imaging or Raman imaging, which record for each pixel an N-dimensional intensity array, representing local abundances of molecules, residues or interaction patterns. The analysis of such multivariate bioimages (MBIs) calls for new approaches to support users in the analysis of both feature domains: space (i.e. sample morphology) and molecular colocation or interaction. In this article, we present our approach WHIDE (Web-based Hyperbolic Image Data Explorer) that combines principles from computational learning, dimension reduction and visualization in a free web application.
Results: We applied WHIDE to a set of MBI recorded using the multitag fluorescence imaging Toponome Imaging System. The MBI show field of view in tissue sections from a colon cancer study and we compare tissue from normal/healthy colon with tissue classified as tumor. Our results show, that WHIDE efficiently reduces the complexity of the data by mapping each of the pixels to a cluster, referred to as Molecular Co-Expression Phenotypes and provides a structural basis for a sophisticated multimodal visualization, which combines topology preserving pseudocoloring with information visualization. The wide range of WHIDE's applicability is demonstrated with examples from toponome imaging, high content screens and MALDI imaging (shown in the Supplementary Material).
Availability and implementation: The WHIDE tool can be accessed via the BioIMAX website http://ani.cebitec.uni-bielefeld.de/BioIMAX/; Login: whidetestuser; Password: whidetest.
Supplementary data are available at Bioinformatics online.
The Allen Brain Atlas (ABA) project systematically profiles three-dimensional high-resolution gene expression in postnatal mouse brains for thousands of genes. By unveiling gene behaviors at both the cellular and molecular levels, ABA is becoming a unique and comprehensive neuroscience data source for decoding enigmatic biological processes in the brain. Given the unprecedented volume and complexity of the in situ hybridization image data, data mining in this area is extremely challenging. Currently, the ABA database mainly serves as an online reference for visual inspection of individual genes; the underlying rich information of this large data set is yet to be explored by novel computational tools. In this proof-of-concept study, we studied the hypothesis that genes sharing similar three-dimensional expression profiles in the mouse brain are likely to share similar biological functions.
In order to address the pattern comparison challenge when analyzing the ABA database, we developed a robust image filtering method, dubbed histogram-row-column (HRC) algorithm. We demonstrated how the HRC algorithm offers the sensitivity of identifying a manageable number of gene pairs based on automatic pattern searching from an original large brain image collection. This tool enables us to quickly identify genes of similar in situ hybridization patterns in a semi-automatic fashion and consequently allows us to discover several gene expression patterns with expression neighborhoods containing genes of similar functional categories.
Given a query brain image, HRC is a fully automated algorithm that is able to quickly mine vast number of brain images and identify a manageable subset of genes that potentially shares similar spatial co-distribution patterns for further visual inspection. A three-dimensional in situ hybridization pattern, if statistically significant, could serve as a fingerprint of certain gene function. Databases such as ABA provide valuable data source for characterizing brain-related gene functions when armed with powerful image querying tools like HRC.
Analysis of gene expression data for tumor classification is an important application of bioinformatics methods. But it is hard to analyse gene expression data from DNA microarray experiments by commonly used classifiers, because there are only a few observations but with thousands of measured genes in the data set. Dimension reduction is often used to handle such a high dimensional problem, but it is obscured by the existence of amounts of redundant features in the microarray data set.
Dimension reduction is performed by combing feature extraction with redundant gene elimination for tumor classification. A novel metric of redundancy based on DIScriminative Contribution (DISC) is proposed which estimates the feature similarity by explicitly building a linear classifier on each gene. Compared with the standard linear correlation metric, DISC takes the label information into account and directly estimates the redundancy of the discriminative ability of two given features. Based on the DISC metric, a novel algorithm named REDISC (Redundancy Elimination based on Discriminative Contribution) is proposed, which eliminates redundant genes before feature extraction and promotes performance of dimension reduction. Experimental results on two microarray data sets show that the REDISC algorithm is effective and reliable to improve generalization performance of dimension reduction and hence the used classifier.
Dimension reduction by performing redundant gene elimination before feature extraction is better than that with only feature extraction for tumor classification, and redundant gene elimination in a supervised way is superior to the commonly used unsupervised method like linear correlation coefficients.
We developed a user-friendly, interactive program to simultaneously cluster and visualize omics data, such as DNA and protein array profiles. This program provides diverse algorithms for the hierarchical clustering of two-dimensional data. The clustering results can be interactively visualized and optimized on a heatmap. The present tool does not require any prior knowledge of scripting languages to carry out the data clustering and visualization. Furthermore, the heatmaps allow the selective display of data points satisfying user-defined criteria. For example, a clustered heatmap of experimental values can be differentially visualized based on statistical values, such as p-values. Including diverse menu-based display options, QCanvas provides a convenient graphical user interface for pattern analysis and visualization with high-quality graphics.
data clustering; genomics; heatmap visualization; microarray anlaysis; pattern recognition
Gene expression data extracted from microarray experiments have been used to study the difference between mRNA abundance of genes under different conditions. In one of such experiments, thousands of genes are measured simultaneously, which provides a high-dimensional feature space for discriminating between different sample classes. However, most of these dimensions are not informative about the between-class difference, and add noises to the discriminant analysis.
In this paper we propose and study feature selection methods that evaluate the "informativeness" of a set of genes. Two measures of information based on multigene expression profiles are considered for a backward information-driven screening approach for selecting important gene features. By considering multigene expression profiles, we are able to utilize interaction information among these genes. Using a breast cancer data, we illustrate our methods and compare them to the performance of existing methods.
We illustrate in this paper that methods considering gene-gene interactions have better classification power in gene expression analysis. In our results, we identify important genes with relative large p-values from single gene tests. This indicates that these are genes with weak marginal information but strong interaction information, which will be overlooked by strategies that only examine individual genes.
The crystal structure of the title compound, C7H14NO2
+·Cl−, was reported previously [Chacko, Srinivasan & Zand (1975 ▶). J. Cryst. Mol. Struct.
5, 353–357] from Weissenberg photographic data with R = 0.113. It has now been redetermined, providing a significant increase in the precision of the derived geometric parameters, viz. mean σ(C—C) = 0.003 Å in the present work compared with 0.021 Å for the previous work. The complete cation is generated by crystallographic mirrror symmetry, with three C atoms, two O atoms and the N atom lying on the reflecting plane; the chloride anion also has m site symmetry. The crystal structure is established by a two-dimensional network of O—H⋯Cl and N—H⋯Cl hydrogen bonds, generating C
2(4) and C
2(7) chains, and R
4(8) and R
The complexity of gene expression data generated from microarrays and high-throughput sequencing make their analysis challenging. One goal of these analyses is to define sets of co-regulated genes and identify patterns of gene expression. To date, however, there is a lack of easily implemented methods that allow an investigator to visualize and interact with the data in an intuitive and flexible manner. Here, we show that combining a nonlinear dimensionality reduction method, t-statistic Stochastic Neighbor Embedding (t-SNE), with a novel visualization technique provides a graphical mapping that allows the intuitive investigation of transcriptome data. This approach performs better than commonly used methods, offering insight into underlying patterns of gene expression at both global and local scales and identifying clusters of similarly expressed genes. A freely available MATLAB-implemented graphical user interface to perform t-SNE and nearest neighbour plots on genomic data sets is available at www.nimr.mrc.ac.uk/research/james-briscoe/visgenex.
Molecular profiling generates abundance measurements for thousands of gene transcripts in biological samples such as normal and tumor tissues (data points). Given such two-class high-dimensional data, many methods have been proposed for classifying data points into one of the two classes. However, finding very small sets of features able to correctly classify the data is problematic as the fundamental mathematical proposition is hard. Existing methods can find "small" feature sets, but give no hint how close this is to the true minimum size. Without fundamental mathematical advances, finding true minimum-size sets will remain elusive, and more importantly for the microarray community there will be no methods for finding them.
We use the brute force approach of exhaustive search through all genes, gene pairs (and for some data sets gene triples). Each unique gene combination is analyzed with a few-parameter linear-hyperplane classification method looking for those combinations that form training error-free classifiers. All 10 published data sets studied are found to contain predictive small feature sets. Four contain thousands of gene pairs and 6 have single genes that perfectly discriminate.
This technique discovered small sets of genes (3 or less) in published data that form accurate classifiers, yet were not reported in the prior publications. This could be a common characteristic of microarray data, thus making looking for them worth the computational cost. Such small gene sets could indicate biomarkers and portend simple medical diagnostic tests. We recommend checking for small gene sets routinely. We find 4 gene pairs and many gene triples in the large hepatocellular carcinoma (HCC, Liver cancer) data set of Chen et al. The key component of these is the "placental gene of unknown function", PLAC8. Our HMM modeling indicates PLAC8 might have a domain like part of lP59's crystal structure (a Non-Covalent Endonuclease lii-Dna Complex). The previously identified HCC biomarker gene, glypican 3 (GPC3), is part of an accurate gene triple involving MT1E and ARHE. We also find small gene sets that distinguish leukemia subtypes in the large pediatric acute lymphoblastic leukemia cancer set of Yeoh et al.
Application of phenetic methods to gene expression analysis proved to be a successful approach. Visualizing the results in a 3-dimentional space may further enhance these techniques.
We designed and built TreeBuilder3D, an interactive viewer for visualizing the hierarchical relationships between expression profiles such as SAGE libraries or microarrays. The program allows loading expression data as plain text files and visualizing the relative differences of the analyzed datasets in 3-dimensional space using various distance metrics.
TreeBuilder3D provides a simple interface and has a small size. Written in Java, TreeBuilder3D is a platform-independent, open source application, which may be useful in analysis of large-scale gene expression data.
Developmental biology investigations have evolved from static studies of embryo anatomy and into dynamic studies of the genetic and cellular mechanisms responsible for shaping the embryo anatomy. With the advancement of fluorescent protein fusions, the ability to visualize and comprehend how thousands to millions of cells interact with one another to form tissues and organs in three dimensions (xyz) over time (t) is just beginning to be realized and exploited. In this review, we explore recent advances utilizing confocal and multi-photon time-lapse microscopy to capture gene expression, cell behavior, and embryo development. From choosing the appropriate fluorophore, to labeling strategy, to experimental set-up, and data pipeline handling, this review covers the various aspects related to acquiring and analyzing multi-dimensional data sets. These innovative techniques in multi-dimensional imaging and analysis can be applied across a number of fields in time and space including protein dynamics to cell biology to morphogenesis.
Confocal; Two-photon; Microscopy; Time-lapse imaging; Embryogenesis
Visualization of DNA microarray data in two or three dimensional spaces is an important exploratory analysis step in order to detect quality issues or to generate new hypotheses. Principal Component Analysis (PCA) is a widely used linear method to define the mapping between the high-dimensional data and its low-dimensional representation. During the last decade, many new nonlinear methods for dimension reduction have been proposed, but it is still unclear how well these methods capture the underlying structure of microarray gene expression data. In this study, we assessed the performance of the PCA approach and of six nonlinear dimension reduction methods, namely Kernel PCA, Locally Linear Embedding, Isomap, Diffusion Maps, Laplacian Eigenmaps and Maximum Variance Unfolding, in terms of visualization of microarray data.
A systematic benchmark, consisting of Support Vector Machine classification, cluster validation and noise evaluations was applied to ten microarray and several simulated datasets. Significant differences between PCA and most of the nonlinear methods were observed in two and three dimensional target spaces. With an increasing number of dimensions and an increasing number of differentially expressed genes, all methods showed similar performance. PCA and Diffusion Maps responded less sensitive to noise than the other nonlinear methods.
Locally Linear Embedding and Isomap showed a superior performance on all datasets. In very low-dimensional representations and with few differentially expressed genes, these two methods preserve more of the underlying structure of the data than PCA, and thus are favorable alternatives for the visualization of microarray data.
The quantity of microarray data available on the Internet has grown dramatically over the past years and now represents millions of Euros worth of underused information. One way to use this data is through co-expression analysis. To avoid a certain amount of bias, such data must often be analyzed at the genome scale, for example by network representation. The identification of co-expression networks is an important means to unravel gene to gene interactions and the underlying functional relationship between them. However, it is very difficult to explore and analyze a network of such dimensions. Several programs (Cytoscape, yEd) have already been developed for network analysis; however, to our knowledge, there are no available GraphML compatible programs.
We designed and developed gViz, a GraphML network visualization and exploration tool. gViz is built on clustering coefficient-based algorithms and is a novel tool to visualize and manipulate networks of co-expression interactions among a selection of probesets (each representing a single gene or transcript), based on a set of microarray co-expression data stored as an adjacency matrix.
We present here gViz, a software tool designed to visualize and explore large GraphML networks, combining network theory, biological annotation data, microarray data analysis and advanced graphical features.
With the integration of the KEGG and Predictome databases as well as two search engines for coexpressed genes/proteins using data sets obtained from the Stanford Microarray Database (SMD) and Gene Expression Omnibus (GEO) database, VisANT 3.0 supports exploratory pathway analysis, which includes multi-scale visualization of multiple pathways, editing and annotating pathways using a KEGG compatible visual notation and visualization of expression data in the context of pathways. Expression levels are represented either by color intensity or by nodes with an embedded expression profile. Multiple experiments can be navigated or animated. Known KEGG pathways can be enriched by querying either coexpressed components of known pathway members or proteins with known physical interactions. Predicted pathways for genes/proteins with unknown functions can be inferred from coexpression or physical interaction data. Pathways produced in VisANT can be saved as computer-readable XML format (VisML), graphic images or high-resolution Scalable Vector Graphics (SVG). Pathways in the format of VisML can be securely shared within an interested group or published online using a simple Web link. VisANT is freely available at http://visant.bu.edu.
ArrayExpress is a public database for high throughput functional genomics data. ArrayExpress consists of two parts—the ArrayExpress Repository, which is a MIAME supportive public archive of microarray data, and the ArrayExpress Data Warehouse, which is a database of gene expression profiles selected from the repository and consistently re-annotated. Archived experiments can be queried by experiment attributes, such as keywords, species, array platform, authors, journals or accession numbers. Gene expression profiles can be queried by gene names and properties, such as Gene Ontology terms and gene expression profiles can be visualized. ArrayExpress is a rapidly growing database, currently it contains data from >50 000 hybridizations and >1 500 000 individual expression profiles. ArrayExpress supports community standards, including MIAME, MAGE-ML and more recently the proposal for a spreadsheet based data exchange format: MAGE-TAB. Availability: .