|Home | About | Journals | Submit | Contact Us | Français|
Malaria risk maps have re-emerged as an important tool for appropriately targeting the limited resources available for malaria control. In Sub-Saharan Africa empirically derived maps using standardized criteria are few and this paper considers the development of a model of malaria risk for East Africa.
Statistical techniques were applied to high spatial resolution remotely sensed, human settlement and land-use data to predict the intensity of malaria transmission as defined according to the childhood parasite ratio (PR) in East Africa. Discriminant analysis was used to train environmental and human settlement predictor variables to distinguish between four classes of PR risk shown to relate to disease outcomes in the region.
Independent empirical estimates of the PR were identified from Kenya, Tanzania and Uganda (n = 330). Surrogate markers of climate recorded on-board earth orbiting satellites, population settlement, elevation and water bodies all contributed significantly to the predictive models of malaria transmission intensity in the sub-region. The accuracy of the model was increased by stratifying East Africa into two ecological zones. In addition, the inclusion of urbanization as a predictor of malaria prevalence, whilst reducing formal accuracy statistics, nevertheless improved the consistency of the predictive map with expert opinion malaria maps. The overall accuracy achieved with ecological zone and urban stratification was 62% with surrogates of precipitation and temperature being among the most discriminating predictors of the PR.
It is possible to achieve a high degree of predictive accuracy for Plasmodium falciparum parasite prevalence in East Africa using high-spatial resolution environmental data. However, discrepancies were evident from mapped outputs from the models which were largely due to poor coverage of malaria training data and the comparable spatial resolution of predictor data. These deficiencies will only be addressed by more random, intensive small areas studies of empirical estimates of PR.
The risks of morbidity and mortality caused by Plasmodium falciparum vary spatially and temporally across the African continent (Trape & Rogier 1996; Snow et al. 1997; Snow & Marsh 2002). There are a number of factors that determine the age-specific patterns of disease risk following infection but the most significant is the role played by acquired functional immunity which is dependent in turn on the frequency and duration of infection from birth (Snow & Marsh 1998). A number of indices are used to describe the frequency of new malaria infections; these include the entomological inoculation rate (EIR), the vectorial capacity, infant parasite conversion rates and the parasite ratio (PR). The relationship between these markers of infection risk is complex (Macdonald 1953; Smith & McKenzie 2004) and most of them, with the exception of the PR, are rarely recorded in Africa (Hay et al. 2000b, 2005a,b). The PR has been used since the 1950s to characterize the intensity of malaria transmission, determined through cross-sectional surveys of children and expressed as the percentage found to be infected with P. falciparum (Metselaar & Van Theil 1959). This measure has been shown to correspond categorically to the log of the annual EIR (Beier et al. 1999; Hay et al. 2005a) and, more recently, has been used to discriminate between malaria morbidity and mortality rates across Africa (Snow et al. 1999, 2004; Snow & Marsh 2002).
The need for maps of malaria distribution has been recognized by malaria epidemiologists for as long as the disease has been studied (Gill 1920; Boyd 1930; Lysenko & Beljaev 1969). More recently, advances in geographic information systems (GIS) and remote sensing (RS) have fuelled a renaissance in malaria risk mapping in Africa (Snow et al. 1996; Lindsay et al. 1998; Craig et al. 1999; Thomson et al. 1999; Hay et al. 2000a; Rogers et al. 2002; Tanser et al. 2003; Hay et al. 2004). Relatively few of the available models of the spatial determinants of malaria risk have used empirical data (Snow et al. 1998; Kleinschmidt et al. 2000, 2001). Moreover those available for Kenya and Mali have not fully exploited the wealth of newer high-spatial resolution RS imagery available and have excluded the influences of human settlement and water bodies. In this paper we re-examine the distribution and intensity of malaria transmission across East Africa through the use of high-spatial resolution satellite sensor imagery, rigorously selected PR data (Omumbo & Snow 2004), ecozonation (Omumbo et al. 2002), as well as urbanization (Omumbo et al. 2005), water body and land-use parameters.
A comprehensive search of published and unpublished literature for malariometric data in the East Africa region has been maintained since 1996 as part of the Mapping Malaria Risk in Africa (MARA) collaboration (Snow et al. 1996; http://www.mara.org.za). The methods used to identify the PR data in the East Africa region and the criteria used for their selection are presented elsewhere (Omumbo et al. 1998; Omumbo & Snow 2004). In brief, electronic database searches, contacts with local malariologists, extraction of data from Ministry of Health archives, manual searches of local journals, conference proceedings and postgraduate theses all provided the basis of the search strategy. Kenyan, Tanzanian and Ugandan survey data were selected for the present analysis according to the following inclusion criteria: contemporary assessments of infection risk (>1979); community-based surveys; age ranges within 0–15 years (surveys where only infants were sampled were excluded); surveys that sampled a minimum of 50 children; and those that provided adequate details on survey location and denominators. Finally, repeat cross-sectional surveys on the same populations by the same investigators within a 24-month period were pooled to a single estimate while surveys undertaken by varied investigators at different times in the same location were reduced to one estimate by excluding either the earlier survey or smaller sample size.
Longitude and latitude co-ordinates (in decimal degrees) were determined for each parasitological survey using a variety of sources: 1:50 000 scale topographic maps (Directorate of Overseas Surveys 1971), digital administrative unit maps in Kenya (UNEP- Global Resources Information Database 1992) or public domain digital gazetteers (GDE Systems Inc. 1995; World Resources Institute 1995). The centroid co-ordinates served as a unique identifier for each survey and, combined with a description of the survey, were used to describe the spatial extent of each sample. Surveys were classified as representing one of five spatial dimensions: first, using ArcView 3.2 (ESRI, Redlands, CA, USA), for surveys representing a single village, the central longitude and latitude was used to define an area of 1 km radius encompassing the community. The second spatial classification reflected surveys that sampled from several villages but presented the data as a single PR (five and seven surveys in Kenya and Tanzania respectively). In this case, a polygon was created to enclose the villages. The third, fourth and fifth spatial criteria corresponded to surveys undertaken at the fifth (sub-location: average area covered 9.4 km2), fourth (location: average area covered 15.4 km2) or third (division: average area covered 34.1 km2) administrative levels in Kenya (Hay et al. 2005b).
Satellite sensor-derived data at 1 × 1 km spatial resolution were obtained for East Africa (19.995° E–52.005° E and 23.755° N–13.005° S) from the United States’ Geological Survey, Distributed Active Archive Centre (URL: http://edcdaac.usgs.gov/1KM/comp10d.asp). These data were archived from the Advanced Very High Resolution Radiometer (AVHRR) on-board the National Oceanic and Atmospheric Administration’s (NOAA) series of afternoon ascending, polar-orbiting meteorological satellites (Eidenshink & Faundeen 1994). All 10 bands of raw channel and quality control data were downloaded for the 93 ten-day composites (dekads) from 31 months (April–December 1992, January–September 1993, February–December 1995 and January–May 1996) and re-sampled to a latitude and longitude co-ordinate reference system. Data quality control flags, solar and zenith scan angle correction and maximum value compositing procedures were implemented with ERDAS Imagine 8.5 (Leica Geosystems GIS & Mapping, Atlanta, GA, USA). For more comprehensive details on these procedures see Hay (2000) and Green and Hay (2002).
Monthly time-series of three primary predictor variables were derived from these images for analysis: (i) the normalized difference vegetation index (NDVI); an indicator of photosynthetic activity and surrogate for moisture availability (Hay et al. 1998, 2000a); (ii) the land surface temperature (LST); whose accuracy is similar to that of spatially interpolated temperature data obtained from ground meteorological stations in Africa (Hay & Lennon 1999; Hay et al. 2000a); and (iii) the middle infrared reflectance (MIR); a satellite ‘temperature’ band that is sensitive to both reflected and emitted radiation (Boyd & Curran 1998) and included as it suffers less from atmospheric attenuation than LST (Cracknell 1997).
Cold cloud duration (CCD) data were also derived from the high resolution radiometer (HRR) onboard the European Meteorological Satellite programme’s (EUMETSAT) Meteosat satellite series and used as a surrogate measure of rainfall. CCD image pixels represent the number of hours during which cold cloud top temperatures below a geographically variable threshold were experienced during a 10-day compositing period. CCD threshold temperatures have been derived empirically for areas of Africa between 0 and 27° N of latitude (Snijders 1991). The CCD has been found to have a root mean square error of ±38 mm when compared with meteorological station recordings across continental Africa (Hay & Lennon 1999). The 8 × 8 km spatial resolution CCD data used for this study were re-sampled to 1 × 1 km spatial resolution.
Landcover data at full spatial resolution (1:100 000) were requested and downloaded from the Africover project’s website (http://www.africover.org). Africover urban and water body themes were produced from visual interpretation of digitally enhanced Landsat Thematic Mapper (TM) images (bands 4, 3, 2) acquired, mainly in 1999, across Kenya. Previous work has shown these variables to be important local modifiers of the PR (Omumbo et al. 2005). The urban area and water body polygons were overlaid on a 1 × 1 km grid of the same dimensions as the satellite imagery and the percentage area of each pixel occupied by the land-cover class calculated. Altitude data available and obtained from a global digital elevation model (Hastings & Dunbar 1998) were resampled to 1 × 1 km spatial resolution using ERDAS Imagine 8.5 (Leica Geosystems GIS & Mapping).
Previous statistical modelling efforts for vector-borne diseases in eastern Africa have shown that predictions are improved markedly by clustering data according to areas of ecological similarity (Brooker et al. 2002), as the factors that influence the distribution and/or abundance of vectors vary between ecological zones (Rogers & Robinson 2004).
In the present analysis we have defined two broad ecological zones across East Africa (Omumbo et al. 2002; Omumbo & Snow 2004). Ecozone 1 (Figure 1) represents areas at the edge of the distribution of malaria vectors where climatic conditions do not favour the propagation of vectors for most parts of the year. Within these areas it is possible to define vector distributions or abundance using a single climatic variable. Thus ecozone 1 is composed of arid areas where transmission is limited by low rainfall [vector breeding is restricted to rainy seasons or to areas near water bodies (Figure 1, area A), and highland areas where low temperatures limit vector and parasite development (Figure 1, area B)]. The ecology of ecozone 2 is diverse and, in general, climatic conditions favour the proliferation of malaria vectors and parasites allowing for longer transmission seasons. This ecozone can be described as being well within the climatic range of malaria vector distribution and a range of environmental variables are required to describe malaria transmission. Ecozone 2 tends also to be more densely populated (particularly in Uganda) and better served by perennial rivers.
As monthly climate data are serially correlated and thus have information redundancy, they were pre-processed using temporal Fourier analysis (TFA). TFA summarizes the correlated data and in doing so, captures epidemiologically important seasonal variations in statistically uncorrelated outputs. The origins, mathematical basis and arguments for the biological appropriateness of TFA are described in detail elsewhere (Rogers & Robinson 2004). Fourteen outputs were recorded from the TFA for each satellite time-series variable. These included the overall mean (a0) and variance (vr) of the observed signal, the amplitudes of the annual (a1), bi-annual (a2) and tri-annual (a3) cycles; the phases (in months) of annual (p1), bi-annual (ph2) and tri-annual (p3) cycles; the proportion of the total variance in the original time-series described by the annual (d1), bi-annual (d2), tri-annual (d3) and all cycles (da), as well as the maximum (mx), minimum (mn) of the signal recomposed from the first three cycles. All these variables, elevation derived from the DEM and percentage of urban and water bodies derived from Africover, were available to the discriminant analysis (DA).
Discriminant analysis was used to identify environmental and population variables that discriminated best between four categories of prevalence namely; 0–<5, 5–<25%, 25–<75% and ≥75%). These classes correspond closely to categorical descriptions of malaria morbidity and mortality burden that have been described previously for Africa (Snow et al. 2003). Very few of the PR surveys identified in this study recorded a true zero prevalence (n = 10). All 10 studies were from areas described according to expert opinion as experiencing seasonal transmission and given the sampling error in defining true zero prevalence this group has been included in the lowest infection risk class (PR = 0–<5%). Predictor variables were selected iteratively on the basis of the generation of the maximum kappa (κ) statistic (Cohen 1960) compared with the other variables in each round of selection. The procedure was repeated until either the maximum κ value was generated or 10 most discriminating variables were selected. It was assumed that the probabilities of membership in any of the four categories of prevalence were equal (i.e. the prior probabilities were 1 divided by the number of prevalence categories). Predictions were completed for all data and the effect of forcing urbanization was tested separately within each ecological zone.
For each training data point, a predictor variable value was extracted using IDRISI Version 2 (IDRISI project; Clarke Labs, Worcester, MA, USA) and output in a database for use in the DA. A customized mapping program (written in Microsoft QuickBASIC Version 4.0 by DJR) was used to derive a discriminant function based on comparing the training data with the predictor variables. Each image pixel was assigned to one of the four PR prevalence categories on the basis of the discriminant function and a predictive image derived. Predictions were not made for pixels where environmental conditions were more than three times the (training set) maximum multi-variate distance (measured as the Mahalanobis distance) from the centroid of the group to which they were assigned.
The accuracy of a predictive map is influenced by several factors, notably the sample sizes of data available in each class (in some cases these may not be large enough to provide a high confidence level in the prediction). Accuracy and agreement was assessed by comparison of observed values with predicted values by means of a classification or confusion matrix (Jensen 1996). Agreement was estimated using the κ and tau (τ) statistics which are related indices of agreement that compensate for the agreement that would be expected because of chance. Values of κ < 40% indicate poor agreement; values between 40% and 75% suggest good agreement and values above 75%, excellent agreement (Landis & Koch 1977). Three additional measures of accuracy were determined; the overall accuracy (OA), determined by the proportion of pixels in the main diagonal divided by the total number of training data pixels, the producer’s accuracy (PA) and the consumer’s accuracy (CA). PA and CA measure accuracy for individual prediction classes. PA (or omission error) refers to the probability of a training data pixel being correctly classified and is the proportion of training data (observed) pixels in a category of prevalence that are classified correctly. CA (or commission error) is a measure of the probability that predicted pixels represent the true classification on the ground and is the proportion of predicted pixels that are classified correctly (Story & Congalton 1986).
The data search identified 330 parasite survey data points that fulfilled the inclusion criteria. The spatial distribution of these data is shown in Figure 1. Of these surveys, 217 studies were from Kenya, 86 from Tanzania and 27 from Uganda. The mean sample size was 375 children with a median (interquartile range) of 204 (120; 427). Thirty-five surveys described a PR 0–<5% (n = 10 for PR = 0), 80 surveys recorded a prevalence between 5% and <25%, 177 surveys recorded a prevalence between 25% and <75% and 38 surveys reported prevalence in the childhood populations surveyed ≥75%.
Discriminant analysis was performed initially without controlling for ecological zone or urbanization and the accuracy of the prediction tested. OA was 72.4% (κ = 0.502, τ = 0.494). On visual comparison with historical (Government of Tanganyika 1956) and contemporary (Craig et al. 1999; Omumbo et al. 2002) modelled malaria risk maps, the resulting predictive map was found to be significantly anomalous in southern Tanzania. The results were improved by stratifying the analysis according to two ecozone classes and by forcing the inclusion of urbanization as a predictor (Figure 2, area A). These modifications marginally reduced OA in both ecozone 1 (OA = 64.0%; κ = 0.483; τ = 0.478; Table 1a) and ecozone 2 (OA = 61.4%; κ = 0.45; τ = 0.308; Table 1b) but provided an output with fewer large-area anomalies when compared with historical (Figure 3a) and more recent climate-driven maps (Figure 3b,c). The OA for the combined ecozone/urban adjusted map shown in Figure 2 was 62.1% (κ = 0.477, τ = 0.495).
Producer’s accuracy and CA values by ecological zone class are also provided in Table 1a,b. In both ecozones 1 and 2, training data predictions were poorest for the PR category 25–<75% (PA = 53.6% and 45.6% respectively). CA was poorest for the PR category ≥75% also for both ecozones [CA = 42.9% (ecozone 1) and 32.7% (ecozone 2)].
The top 10 most discriminating predictor variables selected during the DA and subsequently used in the final prediction were: NDVI a1, LST a2, LST p2, CCD min, Africover water body %, DEM, MIR p1, CCD mean, MIR p2, Africover urban %. Vegetation, rainfall and ‘water body area’ were chosen as significant predictors in the ‘dry’ ecozone 1. In ecozone 2 where we assume water was not generally limiting, temperature variables were most abundant among the predictor variables selected: LST mean, LST p1, MIR min, LST p2, MIR p1, CCD max, NDVI p1, DEM, LST a2, Africover urban %.
We have developed a predictive model of four categories of malaria prevalence in East Africa driven by empirical data and exploiting the potential of currently available high-resolution satellite-derived climate surrogate and other digital environmental data. The significance of non-climatic determinants of malaria transmission, such as urbanization, in determining the intensity of malaria was demonstrated and is consistent with evidence that urbanization consistently reduces P. falciparum EIRs in Africa and infection prevalence (Hay et al. 2000a, 2005a; Robert et al. 2003; Omumbo & Snow 2004). The application of ecological zone stratification increased OA by 6.1% and increased κ values from 0.394 to 0.477. Similar improved results have been seen with the use of ecozone stratification for other models of malaria in West Africa (Kleinschmidt et al. 2001), schistosomiasis (Brooker et al. 2001) and trypanosomiasis (Rogers & Robinson 2004).
We were interested in how our modelled output compared with historical maps of malaria in the sub-region (Figure 3b) which were largely based on expert opinion of climatic patterns as they effect the duration of transmission seasons. The map (Figure 3a) compared favourably with the historical map (Figure 3b) for areas of low malaria risk (Figure 3a; area A), however, there was a marked anomaly in Tanzania south of Lake Victoria where the model appears to have under-predicted high transmission conditions (Figure 3a; area B). The reasons for this are not clear but may be due to the lack of training data from the region (Figure 1; area C). Furthermore climate-based models (Craig et al. 1999) suggest conditions in this area (Figure 3c; area A) are highly suitable for vector and parasite development.
Despite efforts to maximize the accuracy of the predictive map developed in this study, the measures of statistical accuracy derived suggest that there are still many transmission modifying factors that are unaccounted for. A κ value of 0.477, though reasonable, is far from excellent. The highest prevalence categories were over predicted (PA > 75% compared with CA < 43%) and the limitation in defining this prevalence class was principally the small numbers of training data pixels (n = 38). It is clear that clustering the training data according to appropriate ecological strata markedly improves predictions. Using only two ecozones was driven by the paucity of training data and further stratification over such a wide spatially diverse area of East Africa would have been more appropriate.
As with all modelling exercises using opportunistic data, one major limitation is the amount, quality and distribution of the training data. We have been rigorous in ensuring quality but had no control over where and how often training data were sampled. What is clear from Figure 1 is the paucity of empirical infection prevalence data across wide swathes of East Africa. This on the one hand represents the motivation for the spatial modelling of risk, while simultaneously serves as a rate-limiting step in the ultimate predictive accuracy of model outputs across large areas without randomly selected spatial empirical data.
In using high-spatial resolution environmental data we have also had to assume that the resolution of the PR data is comparable. None of the studies described the exact area from which samples were drawn. Where administrative units were used as a sampling frame, the distribution of populations within these samples was not considered. The result is that possibly very low-spatial resolution malaria data have been used to describe transmission at high-spatial resolution resulting in errors which are most evident in areas where environmental conditions are most variable. We conclude that perhaps it is not possible to address this question at this spatial resolution (particularly due to the poor resolution of the PR data) and it may be better to develop maps on a smaller scale with sampling according to well-defined ecological zone strata. In this way local factors such as population distribution and effects of urbanization can be investigated concurrently. This could be addressed within the framework of randomly selected cluster sample surveys used for monitoring demographic and health trends in Africa (http://www.measuredhs.com; http://childinfor.org/).
There are many issues still to be addressed in attempting to predict the spatial distribution of PR with satellite data, some of which will constitute future work of the authors. Important among these is understanding the spatial correlation structure in the PR data and how incorporating this may be used to increase the fidelity of the mapping process. Furthermore, the significance of combining spatial scales of PR data (i.e. those from points vs. those from administrative area polygons of varied size) need to be explored. The importance of using discrete categories of the PR, rather than a continuous variable, needs investigation, as does incorporating methods to weight for the influences of sample size and age structure in the PR data.
Mapping malaria risks continues to be a priority. Without at least a basic knowledge of risk, efforts to control malaria will lack the ability to target limited resources to maximize coverage appropriately among those most at risk at national (Snow et al. 1996) and international scales (Snow et al. 2005). It is our view that even with the best available information on parasite prevalence and better remotely sensed and digital predictor data, high-resolution models remain imprecise. This could be addressed with more empirical, randomly sampled, standardized parasite prevalence data in sub-regions and further investigation of some of the areas outlined above. Meanwhile, lower resolution maps of malaria risk might be the only immediate means by which to define populations exposed to broader definitions of risk allowing international priority setting rather than the expectation of high spatial resolution sub-national planning tools.
This study received financial support from The Wellcome Trust, UK; International Development Research Centre, Canada; the South African Medical Research Council, the MARA/ARMA collaboration and the Kenya Medical Research Institute. The Wellcome Trust supports JAO, SIH and RWS as part of their Prize Studentship (no. 060063), Research Career Development (no. 069045) and Senior Research Fellow (no. 058992) programmes respectively. The authors acknowledge the contribution of malaria control personnel and scientists in East Africa who shared unpublished data. We are grateful to the Africover team for their efficiency and advice. Carlos Guerra is thanked for help with manipulating extractions and Africover data. We are grateful to two anonymous referees for providing insightful critiques. This paper is published with the permission of the director of KEMRI.