Mapping and modelling methods used to study the spatial distribution and spread of vector-borne and directly transmitted infectious diseases are becoming increasingly widespread and sophisticated as the field of spatial epidemiology grows. Spatial epidemiology is defined as "the study of spatial variation in disease risk or incidence" [1
], and its aims are both to describe and to understand these variations [2
], with the ultimate objective being to assist public health decision making. Interactions between pathogens, vectors and hosts, and between these agents and their environment determine spatial variations in disease risk and make the transmission of vector-borne and other infectious diseases an intrinsically spatial process [1
Most studies on infectious disease dynamics are not spatially-explicit, i.e. elements are not explicitly localized in space. Models are typically based on the metapopulation concept, which considers isolated subpopulations subject to colonization and extinction dynamics [4
]. If the species of interest is a parasite, colonization means infection and a local extinction occurs when the host dies or recovers [5
]. This approach is spatially-implicit, as it avoids the use of geographical maps to locate elements. In the majority of non-spatial mathematical models of infectious diseases, the total population is assumed to be constant [7
], but population data have been included, for instance, in non-spatial models of HIV [8
], pertussis [9
], malaria [7
], or in global burden of disease calculations [10
]. However, the spatial nature of infectious diseases, and particularly spatial heterogeneities in transmission and spread, make risk maps and spatially-explicit models of disease incidence valuable tools for understanding disease dynamics and planning public health interventions [1
]. Defining the extent of infectious diseases as a public health burden and their distribution and dynamics in time and space are critical to scoping the financial requirements, for setting a control agenda and for monitoring.
The emergence of spatially-explicit studies in infectious disease research has been supported by improvements in spatial data and tools such as remote sensing and geographical information systems (GIS) [18
], as well as advances in spatially-explicit modelling methods [17
]. GIS are commonly used to combine spatial data from different sources, for mapping disease and for performing spatial analyses to identify the causal factors of observed spatial patterns such as cluster detection or landscape fragmentation analyses [20
]. In addition, the growth in computing, data collection and the centralization of epidemiological data, has lead to an increase in the sophistication and complexity in the mapping and modelling of infectious disease risks.
Among the agents involved in the disease transmission process, human hosts play a crucial role as their density [26
], spatial location, demographic characteristics (e.g. age-risk profiles [27
]) and behaviour [31
] determine their exposure to infection. Any approach that requires the use of modelled disease rates or dynamics requires reasonable information on the resident population for the time period one is intending to estimate risk. Where risks and spread of diseases are heterogeneous in space, population distributions and counts should ideally be resolved to higher levels of spatial detail than large regional estimates. Accurate and detailed information on population size and distribution are therefore of significant importance for deriving populations at risk and infection movement estimates in spatial epidemiological studies [34
]. For many low-income countries of the World, where disease burden is greatest, however, spatially detailed, contemporary census data do not exist. This is especially true for much of Africa, where currently available census data are often over a decade old, and at administrative boundary levels just below national-level [35
Modelling techniques for the spatial reallocation of populations within census units have been developed in an attempt to overcome the difficulties caused by input census data of varying resolutions. National census population data can be represented as continuous gridded population distribution (or count) datasets through the use of spatial interpolation algorithms. Here, we firstly review and compare the methods used in the construction of existing large-scale population datasets, and secondly review applications of these datasets in past studies of disease risk and dynamics.