One variable in the HMF that might be further optimized, especially for different (larger or smaller) MTP plate sizes, is the kernel size. The 5 × 5 size was empirically determined to be optimum for 384-well MTP arrays; however, for application to denser formats (eg, 1,536) it would be useful to test different kernel sizes such as 7 × 7 and 9 × 9. Since the local background estimation depends heavily on obtaining a representative sample population from the MTP, kernel size becomes a significant concern at the MTP edges. Although the total number of elements sampled by the kernel filter is reduced at the MTP periphery because a portion of the mask overhangs the MTP edge, the 5 × 5 HMF was robust enough to maintain a tight correlation between the obtained median value and the local sampling area in the 384-well MTP. Alternative ways to address array edges are well-documented for image processing.
13–15HMFs, like all automated methods for correcting systematic errors, require spatially random MTP data without clustering of wells that have high or low magnitudes. As shown in , the effectiveness of the filter diminishes as the hit density increases, with CVs significantly diminished at hit rates above 20%. High densities or clustering of positive wells, or organizing MTPs with dose series or controls positioned in rows or columns, would result in misrepresentation of the control values as background in the kernel and compromises filter function. One way to circumvent this problem is by nesting controls or dilution series within the MTP in such a way as to avoid having a low- or high-value clusters disturb the function of the filter (ie, by randomizing well positions). Also, in the case of controls positioned in rows or columns, the filter can be programmed to exclude the control wells from analysis. The SLIMS interface used in the DFT corrections offers such an option.
2,3 Alternatively, the HMF kernel could be customized by excluding axial elements (column or row) that might contain control wells and we are exploring this further.
In comparing MTPs before and after correction with DFT, the correction largely restored background values to the mean value range (compare and ). With rare exceptions, however, the DFT corrections also reduced the amplitude of simulated hits (compare hits in vs. and vs. 5G). For example, in the corners of the Experimental4 DsRed array, the DFT method failed to preserve simulated hits altogether and instead reduced the hit to background levels. This tendency to blunt hits is also common to the averaging correction methods (compare with and ). In contrast, the 5 × 5 HMF retained all simulated hits in our datasets, scaling them in agreement with local wells (eg, compare region A-1 in ).
The case of Experimental4 DsRed illustrates an interesting point in the comparison of the DFT and median-based correction methods. The background CV measurements for Experimental4 DsRed () are higher in arrays corrected with 3 × 3 and 5 × 5 median as compared to the DFT. However, viewing reveals that while the DFT reduced the background CV more, it did so at the cost of blunting hits. The DFT correction is also accompanied with a reduction in dynamic range that is apparent in . The lower CVs, blunted hits, and reduced dynamic range after DFT treatment can be explained by a gross flattening of the array contour without regard to discrete hits. For example, the MTP edge correction appears extensive after treatment with the DFT-based method in the Experimental DsRed (compare data in the range 37.5%–50% in vs. ), but the corrections also reduced the magnitude of many of the hits (eg, well A-13).
Further, correction of corner wells by the DFT was more aggressive than the correction of the other edge-proximal wells. The DFT also blunted the hits in MTP corners more aggressively than other edge-proximal wells (eg, compare corrections to A-1 and P-1 to M-1 and A-13 in and ). This correction is unusual and suggests a DFT correction artifact based on highly conserved array symmetry (4 corners). The correction made by the HMF to MTP corner wells, however, appears to be consistent with that made to other edge-proximal wells (eg, compare corrections with A-1 and A-13 in and ).
Although the DFT reduced the background CVs statistically over the entire MTP for both Synthetic2 and experimental arrays ( and ), it introduced waves or “ringing artifacts” in and (eg, area J-17) in the corrected Synthetic2 dataset. The DFT is based on continuous functions that require special treatment to deal with finite data arrays. That is, how does one model the region at the edge where the data ends? In order for continuous functions to work at the edges, assumptions have to be made to create “data” outside the original array. For example, if one produces an artificially larger plate where the “outside” values are zero, a step function is produced that generates distinctive ripple patterns.
22 Various “windowing functions” have been designed to reduce this ripple. Primary screening data have hits that are fundamentally discrete on a background that can be modeled as continuous (a single mean with noise). Because of this, it may be possible to remove the ripple by iteratively refining the estimate of the data “outside” the MTP array to match the mean and noise of the background.
Median filters (), which are nonlinear and natively spatially discrete, do not generate ringing artifacts. However, the HMF-corrected array exhibits symmetric artifacts in the corners of the plate due to the small sample size of the HMF kernel in the corners. This resulted in insufficient sampling of the background and in turn reduced the efficacy of the correction. The corner correction failures in the HMF correction had a much smaller effect than the DFT ringing on the CVs (). One possible solution to this problem is to adaptively increase the kernel area at the corners. The kernel size could also be held constant by moving the target pixel. Alternatively, we have adjusted the kernel pattern to make it more or less sensitive to outliers as the sample size decreases.
23 In addition, we have found that the serial application of multiple discrete filters tuned to common MTP array patterns minimizes the introduction of artifacts at the MTP corner regions, while also having an additive beneficial effect on error correction.
23The DFT method transforms MTP data into the Fourier space by fitting sinusoidal functions to the data.
2 Since sinusoidal functions are by definition continuous, this transformation assumes that the data are also continuous. For MTPs, this means that the DFT correction expects a hit, which by definition has an extreme magnitude, to resemble its surrounding background wells. However, MTP screening uses discrete wells, oftentimes each with diverse reagents (eg, library compounds) being tested, thus the datasets are discontinuous. Therefore, the magnitude of a signal from any one well is unrelated to that of surrounding wells and the piecewise continuity assumption of the DFT method is inappropriate. Fitting of a continuous function to discontinuous data reduces the hits toward background levels, that is, blunts them. The HMF method, on the other hand, is based on nonlinear rank order calculations for each neighborhood (ie, finding the median) and does not assume spatial continuity in the dataset. This inherently discrete method essentially ignores rare extreme values (hits) in its estimation of the background.
In summary, the 5 × 5 HMF performed best overall with regard to statistical improvement of the various datasets tested. The DFT method may benefit from case by case fine-tuning in the frequency domain. The bidirectional HMF might also be further tuned by optimizing the neighborhood size and subregions, but the easily implemented and computationally cheap 5 × 5 HMF performed well on all of the datasets tested here. We conclude that median-based array correction methods best-reduced localized data distortion and assay noise while preserving hit amplitudes, and that discrete background smoothing approaches are superior to ones based on continuous functions for this data type—rare hits in data arrays.