|Home | About | Journals | Submit | Contact Us | Français|
Background South Africa contains more than one in seven of the world's HIV-positive population. Knowledge of local variation in levels of HIV infection is important for prioritization of areas for intervention. We apply two spatial analytical techniques to investigate the micro-geographical patterns and clustering of HIV infections in a high prevalence, rural population in KwaZulu-Natal, South Africa.
Methods All 12 221 participants who consented to an HIV test in a population under continuous demographical surveillance were linked to their homesteads and geo-located in a geographical information system (accuracy of <2 m). We then used a two-dimensional Gaussian kernel of radius 3 km to produce robust estimates of HIV prevalence that vary across continuous geographical space. We also applied a Kulldorff spatial scan statistic (Bernoulli model) to formally identify clusters of infections (P < 0.05).
Results The results reveal considerable geographical variation in local HIV prevalence (range = 6–36%) within this relatively homogenous population and provide clear empirical evidence for the localized clustering of HIV infections. Three high-risk, overlapping spatial clusters [Relative Risk (RR) = 1.34–1.62] were identified by the Kulldorff statistic along the National Road (P ≤ 0.01), whereas three low risk clusters (RR = 0.2–0.38) were identified elsewhere in the study area (P ≤ 0.017).
Conclusions The findings show the existence of several localized HIV epidemics of varying intensity that are partly contained within geographically defined communities. Despite the overall high prevalence of HIV in many rural South African settings, the results support the need for interventions that target socio-geographic spaces (communities) at greatest risk to supplement measures aimed at the general population.
South Africa has experienced one of the fastest growing HIV epidemics in the world and is currently home to more than one in seven of the world's HIV-positive population.1 HIV prevalence varies greatly at a provincial level2 and is highest in the province of KwaZulu-Natal where 39% of antenatal attendees are HIV positive.3 Efforts to curtail the devastating impacts of the epidemic have been disappointing and HIV incidence remains high.4 Consequently, new strategies are required that make better use of the limited resources available.
The spatial structure of sexually transmitted infections (STI) epidemics can have a profound impact on the epidemic dynamics, future spread, persistence and the nature and success of interventions.5 Most risk and health-promoting behaviours are not distributed uniformly across populations, but tend to cluster in specific communities.6 Behaviour is patterned by the social structure in which individuals are positioned, placing individuals in some communities ‘at risk of risks’.7 Research has demonstrated that many STIs are not equally distributed across geographic areas or communities.8–11 As such, the local community is increasingly being seen as critical to understanding the spread of HIV and key to prevention efforts.12–16 Given this background and the problems associated with the identification of the most active individuals and groups, geographic approaches to HIV prevention have received increasing attention.17,18 However, it has been unclear whether localized HIV interventions targeting specific communities could be effective in a rural South African context where prevalence levels are amongst the highest in the world. Consequently, no such interventions have been attempted in such a setting.
A recent editorial commented that focusing scale-up of services where they are needed requires ‘knowing your epidemic’, globally and locally.19 However, because of the difficulty of obtaining spatially referenced HIV data at the local level, aggregated HIV outcomes are frequently available only by large geographic units. These units are often too coarse to capture the true scale of the processes under study. There are potentially numerous disadvantages associated with this type of analytical approach that make assumptions about the size and characteristics of the ‘community’ and location of participants and can mask important sub-area variation and cause misinterpretation of true underlying spatial patterns.20,21 In addition, it is important to distinguish ‘real’ from ‘apparent’ excess number of cases (particularly in large, sparsely populated areas) and to deal appropriately with elements of chance including the problem of multiple significance testing.22 Here we present results from a population-based HIV sero-survey in rural KwaZulu-Natal, South Africa in which we geo-locate more than 12 000 study participants to an accuracy of <2 m. We apply two innovative spatial techniques to investigate the micro-geographical variation in HIV prevalence and clustering of HIV infections at a truly local level.
The study area (Figure 1) is located in Hlabisa sub-district (one of five sub-districts in the rural district of Umkhanyakude in northern KwaZulu-Natal), ~250 km north of the city of Durban. It covers 438 km2 (approximately one-third of the sub-district) and contains a total population of about 87 000 Zulu-speaking people. The majority of the population live in scattered homesteads that are not concentrated into villages or compounds. Running along the eastern boundary of the study area is the National Road, which links the city of Durban with Mozambique, Swaziland and the province of Mpumalanga. The study area is typical of many rural South African settings in that while it is largely rural it also contains a formal urban township and a series of high-density settlements located predominantly along the major transport routes.
The study forms part of the population-based household and HIV surveillance that is linked to the Africa Centre Demographic Information System.23 Every 6 months the surveillance system collects demographic, socio-economic and behavioural data on individuals resident at one of the 11 000 homesteads in the study area. The annual HIV surveillance among residents included all women aged 15–49 years and men 15–54 years and was conducted from 2003 to 2004. The demographic and socio-economic characteristics of the cohort have been described in detail previously.24 Dried blood spot samples were taken during home visits and HIV serology performed using an ELISA methodology. Informed consent was sought and ethical approval received from the University of KwaZulu Natal (E029/2003). The consent rate among those contacted was 58%.24 The age-adjusted HIV prevalence was 27% for women and 14% for men. Prevalence was highest among females aged 25–29 years (51%) and in men aged 30–34 years (44%).24 The average incidence of HIV infection from June 2003 to December 2005 was 3.2 per 100 person-years (95% CI: 2.8–3.8), peaking in the age group of 25–29 years for both men and women.4 All homesteads in the surveillance area have been positioned to an accuracy of <2 m using differential global positioning systems as part of the surveillance programme.25
All 12 221 adult residents (women aged 15–49 years and men 15–54 years) who consented to an HIV test were geo-located to their homestead of residence (Figure 2). For the purpose of protecting participant confidentiality, a small random error has been incorporated into the geographical position of each participant. This also serves to create an accurate visual representation of the distribution of all participants in the survey due to multiple participants sometimes being resident in a single homestead.
To produce robust estimates of HIV prevalence that vary across continuous geographical space we used a Gaussian kernel methodology.26 The Gaussian kernel does not impose any static geographical boundaries on the data and produces prevalence estimates that are sensitive to local variation whilst at the same time being relatively robust to the effects of random noise.
The HIV status of each participant was superimposed onto a 30 × 30 m grid (representing the study area) at the location of individual's place of residence in Idrisi Andes (Clark labs, Clark University, MA, USA). We then passed a two-dimensional standard Gaussian kernel of radius 3 km over the grid. The kernel moves systematically across the grid and calculates a Gaussian-weighted prevalence estimate for every cell's neighbourhood. The resulting prevalence estimate is placed onto a new map at the same location as the central cell. The kernel is then moved one cell to the right (and then down one row at the end of the row) and the process repeated. The size of the kernel was determined on the basis of a spatial variogram27 (constructed using HIV prevalence estimates aggregated by administrative ward), which showed there to be clear spatial dependence in the resulting HIV prevalence estimates within a distance of 3 km. A median of 1134 participants of known HIV sero-status was evaluated for each cell's unique neighbourhood. The resulting continuous HIV prevalence estimates were then converted into a contour map showing lines of equal prevalence at 5% intervals.
To formally identify clusters of infection (either high or low numbers of infections) we applied a Kulldorff spatial scan statistic (implemented within the SaTScan spatial cluster detection programme28). A spatial scan statistic is a cluster detection test that is able to both detect the location of clusters and evaluate their statistical significance without the problems associated with multiple testing. This is done by gradually scanning a window across space. The general statistical theory behind the spatial scan statistics is described in detail elsewhere.29 The scan statistic adjusts for the uneven geographical density of a background population and the analyses are conditioned on the total number of cases observed.
The spatial scan statistic imposes a circular window on a map and it allows the centre of the circle to move across the study region. For any given position of the centre, the radius of the circle changes continuously so that it can take any value from zero up to a specified maximum value. For each potential cluster, a likelihood ratio test statistic was used to determine if the number of HIV cases within the potential cluster was higher than expected. Expected numbers of cases were calculated on the basis of the null hypothesis of complete spatial randomness by assuming that the number of HIV infected individuals cases in each circle is an independent Bernoulli random variable with a constant prevalence. The circle with the maximum likelihood is defined as the most likely cluster, implying that it is least likely to have occurred by chance. The maximum observed value of the test statistic for each possible cluster is then compared with the overall distribution of maximum values. The P-value of the statistic is obtained through Monte Carlo hypothesis testing (9999 iterations), where the null hypothesis of no clustering is rejected if the simulated P-value is <0.05. We allowed the clusters to overlap by <50% and set the maximum search radius of the circle to be 3 km to facilitate comparison with the output generated from the Gaussian kernel smoothing approach. The resulting relative risk estimates were adjusted by sex and 5-year age band.
We used data from the Africa Centre's household socio-economic survey (comprising a total of 29 217 individuals >18 years of age and 10 856 households) undertaken during the same period as the HIV survey and described previously23 to briefly characterize any high- and low-risk clusters identified by the spatial scan statistic in terms of levels of education, household wealth (as measured by numbers of assets owned by the household), marriage, numbers of non-resident household members (‘migrants’) and levels of employment.
The results of the Gaussian kernel analysis reveal considerable spatial heterogeneity in HIV prevalence (range = 6–36%) across the study area (Figure 3). In general, high-density settlements in the south-east of the study area near the National Road have the highest HIV prevalence (>35%) whilst the more inaccessible rural areas near the western boundary have the lowest HIV prevalence (<10%). HIV cases are heavily concentrated in the areas of the urban township and high-density settlements near the National Road where the highest population density coincides with highest HIV prevalence. The results show that 40% of infected individuals live within 1 km of the National Road (Figure 4) and there is a steep fall-off in numbers of infections with increasing distance from the road. The estimated density of HIV-infected individuals (total HIV cases per square kilometre) living within 1 km of the road is 15.7 times higher than the mean density of infected individuals across the remainder of the study area. The exclusion of the township from the analysis had little impact on these results (Figure 4).
The location of all clusters identified by the Kulldorff spatial scan statistic (P < 0.05) corresponded well with the spatial distribution in HIV prevalence generated by the Gaussian Kernel smoothing approach (Figure 3). The scan statistic identified three clusters with large excess numbers of HIV infections (RR = 1.34–1.62) in communities along the National Road. Three other communities were identified containing substantially smaller numbers of infections (RR = 0.2–0.38) relative to expectation (Table 1). These variations are unlikely to have arisen by chance alone (P-value range = 0.001–0.017). Two of the low-risk clusters identified (Clusters 5, 6) are rural but relatively close to high-density settlements whereas cluster 4 is located in a ‘deep-rural’ area. One of the low-prevalence clusters (Cluster 5) is juxta-posed against a high-prevalence cluster (equating to a four-fold difference in risk of infection between the two adjacent clusters) suggesting considerable constraining of sexual mixing patterns within the low-prevalence community. Some high- and low-prevalence areas identified by the Gaussian kernel approach were not characterized as clusters by the scan statistic because the role of chance in creating the variations could not be entirely discounted (at the P < 0.05 significance level).
The high-prevalence communities identified by the spatial scan statistic were characterized by higher aggregate levels of education, household wealth and nearly double the rate of employment, but had lower levels of marriage and proportion of non-resident household members (‘migrants’) in comparison to their low-prevalence counterparts (Table 2).
We apply for the first time two complimentary spatial analytical techniques (kernel smoothing and spatial cluster detection using the Kulldorff spatial scan statistic) to investigate the geographical patterns and clustering of HIV infections at a truly local level. Despite the stage and severity of the epidemic there is remarkable geographical variation in HIV prevalence within this relatively homogenous population that has been exposed to HIV for more than two decades. We find strong evidence for the clustering of HIV infections in communities along one of South Africa's National Roads. This demonstrates that risks for HIV are associated with specific socio-geographic spaces (communities), and provide an opportunity for targeted interventions to supplement existing measures aimed at the general population. The results challenge the prevailing paradigm of a ubiquitous ‘generalized’ rural epidemic. Rather our findings reveal the existence of several localized epidemics of varying intensity that are partly contained within geographically defined communities. Consequently, a one-size-fits-all intervention strategy may not be effective in such a setting where such marked variations in epidemiological and socio-geographical context exist.30
The large (and sometimes abrupt) spatial variations in HIV prevalence further highlight the importance of using a spatial analytical approach that does not rely on aggregation of infections by coarse administrative unit. Unless the administrative unit's boundaries happen to coincide with the ‘boundary’ of the cluster, aggregation will mask important sub-area variation and may result in misinterpretation of true underlying spatial patterns. However, the spatial methods used do have some limitations. Most importantly, the choice of the size of the radius used in the Gaussian kernel can impact significantly on the final result. The appropriate choice of the size of the kernel is described elsewhere31 and always depends on the purpose intended for the smoothed estimate. The smaller the radius used, the greater the range in prevalence estimates obtained and the greater the sensitivity to local variation. The use of a larger kernel will result in smoothing towards the mean and important variation in prevalence may be lost. The choice of the 3-km radius of the kernel used in this study was informed largely by empirical data from spatial dependence in ward-level HIV prevalence. By using a relatively large kernel, we have produced stable estimates that reliably demonstrate the large variations in prevalence in the study area. Although the precise mathematical form of the kernel can also impact on the final result, this effect is relatively modest in comparison to the size of the kernel.26 A limitation of the Kulldorff spatial scan statistic is that clusters are arbitrarily defined as circles. Given the highly heterogeneous spatial distribution of the population and HIV prevalence distribution, an elliptical scan window (with long axis orientated parallel to the National Road) might be appropriate for the detection of excess infections in this setting.
Although widely applied to the detection of clustering of non-communicable diseases, especially cancer,32–34 the application of spatial scan statistics to sexually transmitted diseases has been limited and occurred only in the developed world. The methods have been applied to the clustering of gonorrhoea in the USA8 and Chlamydia trachomatis in Canada35 and the clustering of three bacterial STI infections (gonorrhea, chlamydia and syphilis) in Australia.36 In the developing world, the methods have been used to investigate spatial patterns of childhood mortality in Burkina Faso.37 Common to all these studies was the use of disease data aggregated by area such as census tract or administrative unit. We could not locate any studies that had applied kernel density smoothing methods to analysing the spatial distribution of a sexually transmitted disease. However, examples of its use in other diseases include cancer,38,39 severe acute respiratory syndrome,40 dengue41 and tick-borne encephalitis and Lyme borreliosis.42
Whereas ideally the rate of new HIV infections (incidence) should be used to inform prevention activities, we analyse the distribution of existing HIV infections (prevalence) and as such the results may have some limitations. However, one would expect an individual's risk of acquisition of HIV infection to be strongly related to HIV prevalence in the surrounding community. Indeed, through subsequent rounds of HIV surveillance data we have started to analyse spatial patterns of HIV incidence in the study area and find that they are remarkably similar to prevalence in spatial distribution.
Our findings are consistent with previous work in this population. For example, we found at the level of the individual, that after controlling for age, sex, setting (urban/rural) and educational attainment there is a strong negative association in HIV infection risk with increasing distance to the National Road (F Tanser et al., in preparation). Those living within 1km of the National Road are twice as likely to be HIV-infected (adjusted odds ratio = 1.96, 95% CI: 1.5–2.5) in comparison to someone living >15 km away. Likewise, for incidence, we found in multiple regression analysis that the hazard of HIV acquisition decreased with the logarithmically transformed distance to the National Road (adjusted hazard ratio = 0.856, P = 0.002).4 These findings support ecological observations made in this area43 and other parts of Africa44–46 and suggest that individuals living in communities with better access to transport and transport routes are at higher risk of infection. A recent study in Uganda showed an association between distance to main roads and increase in both HIV incidence and genetic complexity of the virus.46 Individuals from communities located near a main road were twice as likely to be infected with an HIV-1 recombinant strain in comparison with those living near a secondary road.
It is widely accepted that spatial heterogeneity and spatial processes (such as circular migration patterns) during the early phase of the epidemic led to the explosive spread of HIV in South Africa.47 However, given the advanced stage of the epidemic, the apparent widespread high HIV prevalence and long duration of infection, it might be expected that a more even spatial distribution would result at this stage of the epidemic, at least within a small, circumscribed rural population. The fact that such marked spatial variation persists suggests that spatial processes, complex sexual mixing patterns and community effects continue to impact on transmission dynamics at this advanced stage of the epidemic.
Additional work is required to fully understand the reasons for these large spatial variations in HIV prevalence in this population and important insights are likely to be gained by further in-depth study of the communities indentified in this research. A combination of individual, household, community, structural and geographical risk factors are likely responsible for the substantial heterogeneity observed. These include the possibility of higher rates of sex work along the National Road,45 hierarchical diffusion processes of the virus along the transport routes during the early stages of the epidemic,48 higher population mobility and migration rates in populations living near the roads,49 geographical clustering of the most sexually active groups in high-density communities, more dissassortative sexual mixing patterns within the urban township and high-density settlement populations,50 possibility of higher prevalence of other STIs (many of which are linked more directly to the behaviour of high-risk groups51) in some areas52 and systematic geographic variations in other individual, household and community level risk factors.53
Research and theory point to the need for HIV prevention interventions that focus both on persons at greatest risk as well as the general public.54 Targeting efforts at settings where HIV transmission is most intense is crucial. A decrease in HIV incidence in these communities, more extensive coverage of antiretroviral treatment (ART), increased knowledge of infection status, an increase in condom usage and introduction of venue-based interventions could therefore have a disproportionately large impact on the general population as a whole. Despite the overall high-prevalence levels in many rural South African settings, our results call for a ‘place-based’17 approach centred on the identification of high-risk communities to supplement prevention efforts aimed at the general population.
1R01-HD058482-01 from the National Institute of Child Health and Human Development. Core funding for the Africa Centre's Demographic Surveillance Information System (GR065377/Z/01/H) and Population-based HIV Survey (GR065377/Z/01/B) was received from the Wellcome Trust, UK. The funding organizations had no role in the design and conduct of the study, in the collection, analysis, and interpretation of the data, or in the preparation, review or approval of the manuscript.
This article uses data generated by the Africa Centre Demographic Information System and population-based HIV survey. We thank the field staff at the Africa Centre for Health and Population Studies for their work in collecting the data used in this study and the communities in the Africa Centre demographic surveillance area for their support and participation in this study.
Conflict of interest: None declared.