We have developed a predictive model of four categories of malaria prevalence in East Africa driven by empirical data and exploiting the potential of currently available high-resolution satellite-derived climate surrogate and other digital environmental data. The significance of non-climatic determinants of malaria transmission, such as urbanization, in determining the intensity of malaria was demonstrated and is consistent with evidence that urbanization consistently reduces P. falciparum
EIRs in Africa and infection prevalence (Hay et al
; Robert et al. 2003
; Omumbo & Snow 2004
). The application of ecological zone stratification increased OA by 6.1% and increased κ values from 0.394 to 0.477. Similar improved results have been seen with the use of ecozone stratification for other models of malaria in West Africa (Kleinschmidt et al. 2001
), schistosomiasis (Brooker et al. 2001
) and trypanosomiasis (Rogers & Robinson 2004
We were interested in how our modelled output compared with historical maps of malaria in the sub-region () which were largely based on expert opinion of climatic patterns as they effect the duration of transmission seasons. The map () compared favourably with the historical map () for areas of low malaria risk (; area A), however, there was a marked anomaly in Tanzania south of Lake Victoria where the model appears to have under-predicted high transmission conditions (; area B). The reasons for this are not clear but may be due to the lack of training data from the region (; area C). Furthermore climate-based models (Craig et al. 1999
) suggest conditions in this area (; area A) are highly suitable for vector and parasite development.
Despite efforts to maximize the accuracy of the predictive map developed in this study, the measures of statistical accuracy derived suggest that there are still many transmission modifying factors that are unaccounted for. A κ value of 0.477, though reasonable, is far from excellent. The highest prevalence categories were over predicted (PA > 75% compared with CA < 43%) and the limitation in defining this prevalence class was principally the small numbers of training data pixels (n = 38). It is clear that clustering the training data according to appropriate ecological strata markedly improves predictions. Using only two ecozones was driven by the paucity of training data and further stratification over such a wide spatially diverse area of East Africa would have been more appropriate.
As with all modelling exercises using opportunistic data, one major limitation is the amount, quality and distribution of the training data. We have been rigorous in ensuring quality but had no control over where and how often training data were sampled. What is clear from is the paucity of empirical infection prevalence data across wide swathes of East Africa. This on the one hand represents the motivation for the spatial modelling of risk, while simultaneously serves as a rate-limiting step in the ultimate predictive accuracy of model outputs across large areas without randomly selected spatial empirical data.
In using high-spatial resolution environmental data we have also had to assume that the resolution of the PR data is comparable. None of the studies described the exact area from which samples were drawn. Where administrative units were used as a sampling frame, the distribution of populations within these samples was not considered. The result is that possibly very low-spatial resolution malaria data have been used to describe transmission at high-spatial resolution resulting in errors which are most evident in areas where environmental conditions are most variable. We conclude that perhaps it is not possible to address this question at this spatial resolution (particularly due to the poor resolution of the PR data) and it may be better to develop maps on a smaller scale with sampling according to well-defined ecological zone strata. In this way local factors such as population distribution and effects of urbanization can be investigated concurrently. This could be addressed within the framework of randomly selected cluster sample surveys used for monitoring demographic and health trends in Africa (http://www.measuredhs.com
There are many issues still to be addressed in attempting to predict the spatial distribution of PR with satellite data, some of which will constitute future work of the authors. Important among these is understanding the spatial correlation structure in the PR data and how incorporating this may be used to increase the fidelity of the mapping process. Furthermore, the significance of combining spatial scales of PR data (i.e. those from points vs. those from administrative area polygons of varied size) need to be explored. The importance of using discrete categories of the PR, rather than a continuous variable, needs investigation, as does incorporating methods to weight for the influences of sample size and age structure in the PR data.
Mapping malaria risks continues to be a priority. Without at least a basic knowledge of risk, efforts to control malaria will lack the ability to target limited resources to maximize coverage appropriately among those most at risk at national (Snow et al. 1996
) and international scales (Snow et al. 2005
). It is our view that even with the best available information on parasite prevalence and better remotely sensed and digital predictor data, high-resolution models remain imprecise. This could be addressed with more empirical, randomly sampled, standardized parasite prevalence data in sub-regions and further investigation of some of the areas outlined above. Meanwhile, lower resolution maps of malaria risk might be the only immediate means by which to define populations exposed to broader definitions of risk allowing international priority setting rather than the expectation of high spatial resolution sub-national planning tools.