A spatial smoothing procedure based on the head-banging algorithm method described by Hansen is used to create a first baseline map (Figure
]. The head-banging algorithm is particularly useful in describing data with high local variations as it is median polished (not easily influenced by outliers). This algorithm is named from a child’s game where a face is pressed against a board of pins protruding at various lengths, leaving a general impression of the child’s face while smoothing away any excessively-varying local features that are more attributable to random chance. We prefer head-banging to classical Kriging smoothing techniques since the latter could be unduly influenced by a few counties with high heartworm prevalence.
A complication is that varying numbers of dogs were tested in distinct counties. To handle this aspect, the county data is converted to a common basis via standard normal Z-scores. Here, p(s) is the probability that a single tested dog is heartworm positive at location (county), s. If N(s) tests were conducted in this county, k(s) of which are positive, then p(s) is simply estimated as k(s)/N(s). The standard normal Z-score is this estimated probability divided by its standard error:
Counties where no tests are performed do not influence the analysis. Our conventions take Z(s)=10,000 when all tests in the county are positive, and Z(s)=0 when all tests in the county are negative (these somewhat arbitrary conventions are needed to prevent division by zero).
After Z-scores are computed for each county, the head-banging algorithm is applied to spatially smooth them. From the smoothed Z-scores and the county-by-county values of N, one can then convert back to a smoothed probability, representing the probability of a positive test in each county. Figure
is a geographic display of these smoothed probabilities using 20 nearest neighbors (these are viewed as adjacent counties).
The smoothed county, s, estimate of p(s), denoted by p*(s), will be key in our factor identification task. Specifically, after the smoothed probability estimates are computed, we will consider logistic regression models of the form
where L is the number of factors β1 . . . , βL are regression coefficients, f1(s),…,fL(s) are values of the observed regression factors for county s, μ is an overall location parameter, and α(s) is zero mean random error for county, s. Here, logit is defined for values in [0,1] (that is, probabilities) via
is significant in the prediction of a positive heartworm test if βj
≠0. Standard forward and backwards regression model factor selection routines can be used to determine which of the L
factors are significant and how significant is each factor. The interested reader is referred to Casella and Berger (2002) for further elaboration
Forecasts of future prevalence can be obtained from the above logistic regression model as various predictors and data from future years are considered. By inverting the inverting the logit transform of the regression model utilizing forecasted predictor factors, can be an estimated value of p*(s) is obtained. For example, if annual temperature is an important factor, one could use historical temperature data to forecast next year’s annual average temperature. This forecasted factor is then used in the regression equation along with its accompanying estimated value of β. Right now, any such forecasts are annual in nature as no seasonality has been considered. However, after a few years of data are collected, it may be possible to quantify seasonal effects and make monthly forecasts. The CAPC data is updated monthly.