We have developed a metric for evaluating the physical reliability of individual arrays in terms of a degree of spatial correlation among expression measurements. The proposed GEODEX metric was designed to estimate a degree of the geographical independence between gene expression measurements and their geographic locations on an array. This provided an equivalent measure of spatiality-based physical reliability of an array. Through the analyses of multiple real datasets, we found that GEODEX was useful and valid for assessing the quality of individual chips in terms of spatial correlation. Specifically, 1) GEODEX could be used to check the quality of an individual chip in which there are multiple biological replicates available but no technical replicates per biological case. In particular, it could identify seriously damaged chips or chips with small artifacts leading to poor array quality. Because the array detected in experiment A was so obviously at fault, one could argue that inspecting the arrays visually was at least as effective as using GEODEX. Hence, we will investigate how sensitive the method is to small artifacts that are not visually identified but still lead to poor chip quality in future studies. 2) In the presence of sufficient spatial effects on arrays, GEODEX predicted spatial concordance between technical replicates that could be used as a predictor of the reproducibility of technical replicates. 3) If global GEODEX indicated that some kind of spatial artifacts were on arrays, blockwise GEODEX could lead to suspect locations. Also, edge effects and other systematic biases were detected by conducting analysis of blockwise GEODEX. 4) This method was easy to implement in software with a standard statistical analysis tool. Hence, it is readily accessible to laboratory scientists. Therefore, GEODEX offering a good prediction of physical array reliability in terms of spatial correlation can be installed in laboratories as a quality control monitor to allow investigators to determine whether arrays have adequate quality to be retained as part of a larger data set.
Regarding issues of model choice in analysis, we chose an intermediate cubic polynomial response surface model as a compromise between oversimplifying and overspecifying variables (e.g., to avoid under-parameterizing or over-parameterizing the models) to introduce a novel idea for how to proceed with this proposed approach. However, our approach is not restricted to a particular model and is flexible enough to incorporate into models any necessary data-specific regressors that would provide a best approximation of GEODEX. One can also develop other functional models that establish a relationship between measurements and their geographic locations on an array. To enhance the practical utility of this method, development of more extensive prescription-type procedures searching for best models will be undertaken in future studies.
To calculate GEODEX, we have focused on a number of robust models between local approaches (e.g., blockwise GEODEX) and global approaches (e.g., global GEODEX). These ranged from highly local approaches that regressed measurements to neighboring models to completely global approaches that regressed them to across-all-over models. It appeared that a regressor-dependent model, such as model (2), reasonably predicted a chip's quality. On the other hand, local approaches tended to be more effective when spatiality varied across blocks. For example, a block-by-block analysis could not only detect outliers but could also explore the nature of irregular patterns in the neighborhood of outliers, such as edge effects. In contrast, when local GEODEX across blocks did not change dramatically, global approaches were sensitive enough to assess a summary of the physical reliability of arrays.
Developing benchmark GEODEX threshold values for discriminating between good- and poor-quality arrays for different platforms and manufacturers will require application of the metric against much larger and more diverse data sets than those used in this study. The narrow range of GEODEX across all arrays and chips used in this feasibility study suggests that in general all of these arrays were of good quality. A larger, more complex and diverse quality set of array data would provide a test for such benchmarks. However, in practice, a threshold that determines whether a chip should be retained or discarded very likely varies from laboratory to laboratory because of many factors such as different batches, technicians, chip types, production methods, and experiments. Hence it is difficult to judge whether a specific benchmark for one experiment can be transferred to another experiment. The proposed metric would be interpreted relative to other chips obtained from a single experiment in a given laboratory. However, there are some uses of the GEODEX that are not related to using a threshold for discarding chips. For example, one may wish to assess trends in the quality of chips being produced in a facility. In such cases, the judgment of chip quality is based on the empirical distribution of all GEODEX values for chips from a single experiment and/or across multiple experiments. Alternatively one might want to use the index to assess the level of training and development of a student or technician. The GEODEX metric does not appear to be able to distinguish between variability due to positional effects created during fabrication of chip or array versus those effects that arise during labeling and hybridization and, therefore, the validity of the global claims of superiority of one platform or another in cross-platform comparison.
Normalization is the process of removing biases due to technical variation. However, some results show that biases still remain in the data after normalization even normalization adjustments account for local spatial variation [7
]. Smyth and Speed [16
] indeed noted that poor spot qualities could contribute to biases. Thus they suggested that a regression-based normalization method might be improved by incorporating quality weights for individual spots. GEODEX can be used as a quality assessment procedure for spots and chips before normalization to identify the spatial position of less reliable spots on the slide so that a quality weight is assigned according to a degree of spot quality. Gene expression data adjusted by a combination of GEODEX and normalization can be used for downstream analysis such as differential gene expression, and their resulting estimates are expected to be more precise because of a reduction of biases in data.