Scientific research requires data visualization for exploration and discovery, as well as in written and oral presentations. Biological science traditionally relies on manual analysis of curated datasets – an approach that is becoming increasingly challenging, as data sets and the volume of published material grow exponentially. Custom software combined with manual intervention have become necessary in order to efficiently manage these data sets, while providing the necessary results of analysis. Many methods of data management and analysis have been presented, however the techniques for data representation have remained largely unchanged. As a result, many custom and private solutions have been developed to satisfy data visualization needs.
In our own work in the fields of computational biology and medicinal chemistry, color-grid representations have become increasingly useful to illustrate trends in data, whether they be correlations between features of pre-mRNA splice sites or structure-activity relationships (SAR) from large small-molecule libraries. Color-grids are two-dimensional data arrays where data values are represented visually by associated color intensities (see Figure ). We were inspired in part by heat maps that effectively use color as an additional dimension, a technique successfully applied to microarray data visualization. However, the pioneering microarray data analysis software Cluster and TreeView [1
] and recent extensions of this work [2
] have some disadvantages when generalizing to a wider variety of datasets: a need for computational data manipulation, a focus on data clustering, incompatibility with text values and data exceptions, and limited coloring capabilities. We remedy some of these shortcomings with the design of an abstract color-grid graphical object and implement creation and rendering of such graphics in a program called, JColorGrid. The software is applicable to a wide variety of input data while retaining ease of use.
Figure 1 JColorGrid parameters and graphical output. A. The JColorGrid graphical configuration menu or text configuration files (not shown) allow users to create custom color-grids. The configuration shown was used to generate the color-grid shown in B., where (more ...)
Data representation with a color-grid allows researchers and audiences to rapidly identify trends within large data sets. Color-grids follow the main graphical presentation tenets set forth by Edward Tufte: utilizing color to enhance information, facilitating micro and macro readings of data, graphical layering and separation, and use of "small multiple designs" for "graphical depictions of variable information that share context, but not content" [5
]. Color-grids are data-dense and easily interpretable at different scales of analysis, making color-grids increasingly popular in the scientific literature. While color-grid representations are available in some advanced statistical packages (e.g. MatLab and R [6
]), the commonly used free and commercial spreadsheet and statistical packages (e.g. StarOffice, MS Excel, SigmaPlot, KaleidaGraph, GnuPlot), do not offer color-grids as a graphical representation. It should be noted, that where available, color-grid outputs have limited utility due to the level of expertise necessary to work with these complex statistical packages. The absence of stand-alone software capable of automatically generating color-grids prompted us to develop JColorGrid, a Java application that serves as an engine for generating custom color-grid representations. Our motivation was to offer a novel, automated, general-purpose means to graphically represent complex data sets from various research disciplines following graphical visualization guidelines.