By disaggregating the town level data through the restricted and controlled Monte Carlo
(RCMC) process, we generated maps of birth defects for New Hampshire at the pixel level rather than the town level. RCMC is essentially a dasymetric process that allocates the total amount for an area unit to different places within the unit, by taking into ancillary information. However, conventional dasymetric mapping is deterministic, while RCMC is stochastic. Several advantages of such a mapping process can be identified:
(1) The disaggregation allows analytical processes designed for individual data to be applied, which avoids or mitigates the problems associated with aggregate data.
(2) The resulting raster maps have resolutions at the pixel level (100 m in this study), which presents more detailed spatial distribution of disease, compared with the conventional polygon map. Those details give the raster maps advantage in detecting spatial associations between birth defects and certain environmental factors.
(3) The RCMC process maximizes the use of available spatial information. First of all, restricting the randomization with the smallest aggregate units maximizes the use of the spatial information represented by the polygon. Furthermore, controlling the randomization with the background data layer provides an open mechanism ready to take into account any available information that can help reduce spatial uncertainty and improve analysis quality. In this study, the background data layer of females in a certain age category eventually incorporates rich information from different sources, including the total number of people from the LandScan data and age and sex information from the Census data. The LandScan data are a product of a sophisticated model that incorporates information about population, land use, terrain, night lights, traffic, and others [39,40]. Other information, if available, can find its way into the background layer used by RCMC. For example, if a socioeconomic factor is known to be a confounding factor of a disease, and detailed information about its spatial distribution is available, it can be incorporated into the background layer.
(4) The RCMC process explicitly quantifies the spatial uncertainty caused by data aggregation. Little, if any, information about the spatial uncertainty in a polygon map can be conveyed to the user of the map. RCMC resolves this problem by running the randomization iteration many times. The variance in the results from these iterations represents the uncertainty caused by aggregation, which can be explicitly and easily quantified. Essentially, this is an approach based on the idea of sensitivity analysis that empirically models variance through intensive computation.
It should be particularly noted that RCMC is a stochastic process and therefore its results should not be interpreted in a deterministic way. Specifically, one should keep in mind that a value in map a in is the mean of many possible p-values at that location, and is not necessary the true p-value.
It should also be noted that the spatial uncertainty represented and presented by RCMC is only the uncertainty resulting from spatial aggregation. There are other spatial uncertainties in the result, such as that from KRE. The bandwidth of kernel is eventually an instrument and representation of spatial uncertainty: the larger the bandwidth, the higher the spatial uncertainty. Specifically, if we consider a disease case to be a realization of a random variable in its support
., the population at risk around the disease case), then the more extensively the support is geographically distributed, the more uncertain where that realization will occur. A background-adaptive bandwidth (the one used in this study) may become fairly large in a less populous area to enclose enough support, in order to ensure statistical stability of the estimated ratio value. In other words, in a less populous area, the means for maintaining statistical stability is to increase the spatial uncertainty. More generally, usually the spatial uncertainty and statistical stability form a tradeoff [19
]. Therefore, a large dark patch in the hot spot map, like the one in north NH in (c), does not necessarily mean that the entire area is a high risk region. The proper interpretation of such large patches is that there are high-risk locations within these areas.
Like many previous methods of its kind, this process has an inherent problem of multiple testing. If the test at each cell is independent from one another, under α = 0.1 it is expected to see 10% of the cells standing out as significant, even though they may not bear epidemiological meaning. In fact, the marked cells in our hot spot map only account for 1% of all the cells in NH, which makes it possible that they are simply an outcome of multiple testing. However, it should be considered that: (1) the test at each cell is not independent, as the kernel at a cell has substantial overlap with its nearby kernels [31
]; and (2) the constraint applied to the RCMC output (i.e
., the two-standard deviation cut) sets a high bar for a cell to be marked as a hot-spot cell. Both should have considerably mitigated the problem of multiple testing, although to quantitatively evaluate their effects is complicated and yet to be explored. After all, the primary goal of disease mapping is to help form hypotheses and inform research design for further investigations, rather than draw determinant conclusions. Even it does not eliminate the problem of multiple testing, the proposed method is advantageous over the conventional mapping methods based on aggregate data in serving this exploratory purpose.