Search tips
Search criteria 


Logo of cidLink to Publisher's site
Clin Infect Dis. 2013 February 15; 56(4): 517–524.
Published online 2012 November 1. doi:  10.1093/cid/cis932
PMCID: PMC3552528

Forecasting High-Priority Infectious Disease Surveillance Regions: A Socioeconomic Model


Background. Few researchers have assessed the relationships between socioeconomic inequality and infectious disease outbreaks at the population level globally. We use a socioeconomic model to forecast national annual rates of infectious disease outbreaks.

Methods. We constructed a multivariate mixed-effects Poisson model of the number of times a given country was the origin of an outbreak in a given year. The dataset included 389 outbreaks of international concern reported in the World Health Organization's Disease Outbreak News from 1996 to 2008. The initial full model included 9 socioeconomic variables related to education, poverty, population health, urbanization, health infrastructure, gender equality, communication, transportation, and democracy, and 1 composite index. Population, latitude, and elevation were included as potential confounders. The initial model was pared down to a final model by a backwards elimination procedure. The dependent and independent variables were lagged by 2 years to allow for forecasting future rates.

Results. Among the socioeconomic variables tested, the final model included child measles immunization rate and telephone line density. The Democratic Republic of Congo, China, and Brazil were predicted to be at the highest risk for outbreaks in 2010, and Colombia and Indonesia were predicted to have the highest percentage of increase in their risk compared to their average over 1996–2008.

Conclusions. Understanding socioeconomic factors could help improve the understanding of outbreak risk. The inclusion of the measles immunization variable suggests that there is a fundamental basis in ensuring adequate public health capacity. Increased vigilance and expanding public health capacity should be prioritized in the projected high-risk regions.

Keywords: communicable diseases, disease outbreaks, socioeconomic factors, public health, epidemiology

(See the Editorial Commentary by Polgreen and Polgreen on pages 525–6.)

The dynamics of infectious disease emergence and spread is a complex process, but it has been recognized that they are at least partly propelled by changes in socioeconomic, environmental, and biological factors [1, 2]. In this study, we turned our attention to the socioeconomic perspective specifically in aiming to better understand infectious disease risk for a population.

Socioeconomic factors span a hierarchical continuum from distal to proximate (Figure (Figure1)1) [3]. At the distal end of the spectrum are population-level factors such as inequality, education, gross domestic product, and public health spending, whereas individual-level factors such as personal wealth, nutritional status, and access to healthcare lie at the proximate end. Studies assessing individual level risk factors have been plentiful, but calls for more research from a population-level perspective in the 1990s highlighted the scarcity of work at the distal range of the spectrum [4, 5]. Since then, a rich literature has built up connecting macro-level socioeconomic factors to population health outcomes such as infant mortality [3] and life expectancy [6]. Broadly speaking, these outcomes have been shown to improve as a country becomes increasingly “developed,” with variation being partly attributed to externalities related to inequality [7] or social cohesion [8]. It would be expected that these trends extend to infectious disease outcomes as well, although in fact the role of population-level socioeconomic factors in infectious disease outcomes has received relatively little attention [9].

Figure 1.
A socioeconomic model of disease outbreaks. The present study is concerned with the distal factors contributing to outbreak emergence. Adapted with permission from authors Schell, Reilly, Rosling, Peterson, and Ekström [3].

In this study, we examined the association between national-level socioeconomic variables and national-level risk for infectious disease outbreaks, as approximated by the number of outbreaks reported in the World Health Organization's (WHO) Disease Outbreak News reports for which a country was the first to have cases. Efforts toward a better understanding of how various factors drive emergence and spread are critical for formulating policies that optimize strategies for the prevention, detection, and management of outbreaks. Understanding geographic risk is also important to determine whether global resources are being properly allocated, especially as efforts for infectious disease surveillance and research are skewed toward a limited set of wealthier countries [10].



Outbreak Data

In a previous study [12], a historical global database of infectious disease outbreaks reported in the WHO's Disease Outbreak News reports ( was assembled. These reports describe confirmed public health events deemed of international concern. Certain types of outbreaks, such as those of endemic diseases, were excluded from the database (see Appendix for details). The country listed for each outbreak is generally the first country to have cases, or if unclear, the country with the most cases.

We counted the number of times each country (of 213 study countries or territories) appeared in the outbreak database each year (0 if none) from 1996 to 2008 (we excluded 2009 because much of the 2009 socioeconomic data were not yet available at the time of the study).

Socioeconomic Data

Focusing on the distal level of a hierarchical socioeconomic model of disease outbreaks (Figure (Figure1),1), we selected a single variable to represent each of 9 categories: education, financial welfare, population health, urban development, health infrastructure, gender equality, communication infrastructure, transportation infrastructure, and political health (Table (Table1).1). Our choice of variables is based on literature on social and environmental determinants of health and infectious disease, as well as on our impression of data quality and availability. For example, numerous studies have linked education, wealth, and gender equality to better health outcomes [6, 12, 13]. The more educated are more likely to be familiar with healthy practices such as immunization and personal hygiene [14], or more fundamentally, basic literacy and numeracy skills allow one to continue enriching their health knowledge throughout life [15]. Higher incomes also enable access to better nutrition, safe water, sanitation, and medical care [8], while gender equality has positive outcomes for both women's health [13] and the health of their children [15]. In addition, communication and transportation infrastructure could represent accessibility (for disease spread as well as flow of information and medical care). On the other hand, infrastructure development is often an outcome of urbanization, and as hubs for travel and high-density populations, cities facilitate rapid person-to-person spread of disease [14]. However, in developed countries, cities also tend to have wealthier and more-educated residents with access to better public health infrastructure. Studies have also found positive effects of democracy on population health, while political corruption could lead to mismanagement of government spending [16]. We used The Economist's Democracy Index [17] to represent political health, while all of our other socioeconomic variables came from the World Development Indicators (WDI) databank (, December 2010 edition, which covered the 213 study countries. Finally, we decided to also test the United Nations Development Programme's Human Development Index, which is defined as a “composite index measuring average achievement in three basic dimensions of human development—a long and healthy life, access to knowledge and a decent standard of living” [18].

Table 1.
Initial Full Model Included 3 Potential Confounder Variables, 1 Variable for Each of 9 Socioeconomic Categories, and 1 Socioeconomic Composite Index

Confounders (Demographic and Geographic Data)

Previous modeling work has suggested that population and host density [19, 20] and climate (with absolute latitude often used as a proxy measure) [21] are related to both pathogen and disease prevalence and spread. In addition, we believed there would be an elevation gradient for the density of human and disease vector populations. Therefore, we included total population, absolute latitude of a country's population centroid, and the proportion of a country's surface area below an elevation of 500 meters above mean sea level in our model to control for demographic and geographic factors that potentially elevate the baseline risk for outbreaks.

Missing Values

There was a large proportion of missing values in our socioeconomic data, as expected, as not all countries collect and/or report these data or do so every year. Therefore we employed a multiple imputation method using the Amelia II version 1.5 package in R [22]. The method involves filling in n estimated values for each missing value, resulting in n “complete” datasets. Analyses are then carried out on each dataset separately, and the results are combined at the end. Certain transformations (eg, logarithmic) were applied to the data before the missing values were imputed (see Appendix for details).

Statistical Analysis

A multivariate mixed-effects Poisson regression model was constructed to predict the number of outbreaks in a given country in a given year from national-level socioeconomic indicators. There was no overdispersion in the outcome variable. The model included fixed-effects terms for the various socioeconomic indicators, and for population, latitude, and elevation as potential confounders, and a random-effects term (intercept only) to account for country-level clustering. Autocorrelation function plots showed no significant temporal autocorrelation. Because the release of the socioeconomic data seemed to generally be delayed by at least 2 years, we lagged the independent and dependent variables by 2 years.

Fixed-effects regressors were chosen for the multivariate model via a backward elimination procedure using the Akaike information criterion. We performed this backward elimination procedure separately on each of the 5 imputed datasets, then identified the variables that remained at the end of at least 3 of the 5 runs. Models were reestimated for each of the 5 imputed datasets including just the identified variables, then a final model was obtained by combining the results of the 5 sets using the procedure described by King and Scheve [23]. Models were constructed using the Zelig package in R [24].

To assess model fit, a Monte Carlo–based version of the Pearson χ2 goodness of fit test [25] with 2000 replicates was conducted. This simulation-based approach was chosen to avoid having to pool categories, as would be required for the standard Pearson χ2 test owing to the many cells with an expected count of <5. To assess prediction efficiency and thereby perform model validation, leave-one-out cross-validation was conducted, and average prediction errors between actual and predicted values were computed.

The final model was then used to forecast the expected number of outbreaks in 2010 for each country using socioeconomic data for 2008 (2-year lag). Analyses were performed in R version 2.12.1 (R Project for Statistical Computing, Vienna, Austria).


Our database included 389 outbreaks during 1996–2008. The highest outbreak rate occurred in the Democratic Republic of Congo (DR Congo; average of 1.77 outbreaks per year), followed by Sudan (0.92) and China, Brazil, India, and Guinea (0.77 each). Of the 213 study countries, 108 had no outbreaks during the entire study period (see Supplementary Table A2 for list), and the majority of the other countries had no reported outbreaks in most years. A summary of the socioeconomic data (minimum and maximum values for countries with no outbreaks as compared with countries with at least 1 outbreak during the study period) is given in Table Table11.

In preliminary models (data not shown), all socioeconomic variables except for health expenditure in the public sector were on their own significantly associated with outbreak rate. The final multivariate Poisson mixed-effects regression model predicting the number of outbreaks in each country in each year included 2 socioeconomic variables: the log of the number of telephone lines (per 100 people; coefficient = −0.59, P value = .02) and the logit of measles immunization (percentage of children aged 12–23 months; coefficient = −0.20, P value <.001); see Table Table2.2. In addition, the population and latitude confounder variables remained. There was a high correlation between many of the socioeconomic variables themselves as well as with the confounder variables (Supplementary Table A2). Model diagnostic tests indicated an adequate model; the Monte Carlo–based χ2 goodness of fit test did not demonstrate any significant lack of fit (P value = .72), while a leave-one-out cross-validation indicated that the error between predicted and observed values differed by 0.19 outbreaks on average.

Table 2.
Final Multivariate Mixed-Effects Poisson Regression Model for the Number of Outbreaks in a Given Country in a Given Year

Using the model to forecast the expected number of outbreak for 2010, the top 5 high-risk geographic areas were predicted to be DR Congo (0.77 outbreaks), China (0.59), Brazil (0.54), India (0.48), and Sudan (0.43); see Figure Figure22A and Table Table3.3. Figure Figure22B depicts for each country the difference between the forecasted rate for 2010 and the average annual rate during 1996–2008. For most countries, the model generally predicted fewer outbreaks in 2010 compared to their historical rates, consistent with the general decline in the number of outbreaks during the period 1996–2008. Dividing countries into risk-level categories based on their historical average annual outbreak rate during 1996–2008, Table Table44 lists for each category the countries with the greatest percentage of decrease between their historical average annual outbreak rate and their model-forecasted number of outbreaks for 2010. Contrarily, several low-risk countries became moderately higher risk. Colombia's 2010 forecast was an 89% increase (from 0.08 to 0.15), and Indonesia's was 22% (from 0.23 to 0.28). Togo and Haiti also had a projected increased risk that was 13% and 12% higher, respectively, although the absolute change (0.01 each) was minimal. Countries that originally had zero outbreaks in our database for 1996–2008 were excluded from these percentage of change calculations as the denominator cannot be zero. Among these countries, Myanmar, Mexico, and Papua New Guinea were forecasted to have 0.16, 0.12, and 0.11 outbreaks, respectively.

Table 3.
Countries Projected to Have >0.25 Outbreaks in 2010, as Forecasted by Our Model
Table 4.
Top 5 Countries, for Each of 4 Risk-Level Categories, With the Greatest Projected Percentage of Decrease Between the Average Number of Outbreaks per Year During 1996–2008 and the Model-Forecasted Number of Outbreaks for 2010
Figure 2.
The model-forecasted number of outbreaks for 2010 (A) and its difference from the historical rate over 1996–2008 (B). There was a general trend of decline in the number of outbreaks over the course of the 1996–2008 period and correspondingly, ...


Of the various socioeconomic factors related to education, financial welfare, population health, urban development, health infrastructure, gender equality, communication infrastructure, transportation infrastructure, and political health that were included in the initial full multivariate model to explain the number of outbreaks in a country in a given year, only the proportion of children aged 12–23 months vaccinated against measles (representing population health) and the density of telephone lines (representing communication infrastructure) remained in the final model. These results nonetheless demonstrate that national-level socioeconomic indicators, in conjunction with other known correlates such as population and latitude, could be useful for model-based projections for surveillance purposes.

Because of the high correlation between the socioeconomic variables themselves as well as with the confounder variables, it is not possible to tease apart the individual contribution of each factor to a country's outbreak rate or to infer causality, but the model results as a whole do provide a basis for a link between socioeconomic well-being at the population level and national risk for infectious disease outbreaks.

Both governments and nongovernmental development assistance agencies spend billions of dollars on public health and medical care on the principle that disease control can reduce prevalence. Previous studies have shown strong negative associations between healthcare spending and pathogen prevalence [26] and the speed of infectious disease spread [27]. Although we looked at outbreak rates and not disease prevalence, the absence of the variable for the proportion of health expenditure in the public sector in our final model was still a surprising result. This finding may be related to the “displacement effect,” whereby countries receiving foreign aid for health programs use the aid in place of (and thereby reduce) domestic health spending [15, 28]. Furthermore, in most countries, government spending ends up being concentrated in cities, though city dwellers tend to be better off already [15].

Another related health variable, the proportion of children aged 12–23 months vaccinated against measles, did consistently remain in our model across most of our imputation sets. Measles vaccination coverage is often used as a proxy indicator for access to child health services [29], and in contrast to the health expenditure variable is perhaps a better measure of the public health state as it represents actual implementation and efficacy; designating funds for health expenditures is less meaningful if committed funds are not ultimately dispensed, or if dispensed funds are not managed effectively. Publicly funded immunization programs and strong public health education campaigns enable access and encourage vaccine uptake. Therefore, the inclusion of the measles immunization variable in our final model suggests a fundamental basis in adequate public health capacity and efficacy. A 1992 Institute of Medicine report also identified breakdown of public health measures as 1 of 6 categories of factors responsible for the emergence of infectious diseases [30].

Density of telephone lines, included to represent communication infrastructure, was the only other socioeconomic variable that remained in our final model. Its connection with outbreak rate is less obvious, but one theory is that it may be representative of the flow of information in and out of the community. A community that is better informed may be better equipped for public health threats. However, it may merely be acting as a proxy of development, which is associated with better healthcare and public health infrastructure.

Other studies have described the global geographic historical distribution of the origins of emerging infectious disease events [20] and of human pathogen richness [26]. We analogously identified some potential future high-risk geographic areas for propensity of outbreaks. Some of these countries, such as the DR Congo, which took the top spot, have historically been at high risk for outbreaks and were predicted to continue to sit high in the risk ranking in 2010. We also looked at countries with a high increase, proportionately, between their historic average annual outbreak rate during 1996–2008 and their projected number of outbreaks in 2010. In particular, the model results suggest that Colombia and Indonesia could potentially require increased vigilance and outbreak prevention initiatives as they were predicted to have a large percentage of increase in risk. Myanmar, Mexico, and Papua New Guinea, which had no outbreaks listed in our historical database during 1996–2008 but had the highest predicted absolute increase in outbreak rate for 2010, are perhaps additional target countries.

We decided to not compare the expected number of outbreaks for 2010 as predicted by the model to the observed number of outbreaks in 2010. Because the model essentially represents a smoothed overall trend, the comparison would be a difficult one owing to the instability of data when looking at any given year.

Limitations of this study include reporting and selection biases in the WHO's Disease Outbreak News reports, from which we drew our dataset. The problem of missing values is also an inherent issue in socioeconomic data and can result in biases when data is not “missing at random.” Another limitation is that all outbreaks, whether of high morbidity and far-reaching or ultimately relatively limited, were weighted equally. However, the problem is lessened by the fact that the WHO's Disease Outbreak News reports focus on outbreaks that are large or dangerous enough to be of international significance.

The high correlation between many of the socioeconomic variables is a main concern as it makes it difficult to distinguish their individual effects when analyzed together in a multivariate analysis. While we believe ecological studies such as this one do have value [31], one must be cautious to not imply causation in an ecological study, as some of these socioeconomic factors, as indicators of development status and financial capacity, may merely be proxy measures for other underlying causal pathways [32]. The aggressive implementation of health programs in areas at need could further complicate the picture by creating an impression that areas with health programs are associated with poorer health outcomes [33].

Our results demonstrate that socioeconomic epidemiology should play an important role in research efforts aimed at understanding the drivers of infectious disease outbreaks. Although there are individual-level mechanisms that biologically dictate how infection proceeds, it is important to understand population-level dynamics as well, since outbreaks of infectious disease are ultimately driven at the population level, and it is this gap in our understanding at the population level that we have aimed to help fill in deciding to conduct this study as an ecological one. From the perspective of actionable strategies, it may also be easier to enact policy changes than it is to try to control biological mechanisms or enforce individual behaviors. Others have also highlighted the need for identifying population-level risk factors so as to not miss opportunities to adopt societal-level interventions [4] and to revitalize the concept of public health [5]. Understanding these components in finer detail could help inform how to best strategically allocate funds to specific public health measures and high-priority geographic regions.

Supplementary Data

Supplementary materials are available at Clinical Infectious Diseases online ( Supplementary materials consist of data provided by the author that are published to benefit the reader. The posted materials are not copyedited. The contents of all supplementary data are the sole responsibility of the authors. Questions or messages regarding errors should be addressed to the author.

Supplementary Data:


Acknowledgments. We are grateful to Alexei Zelenev (Yale University) for his statistical feedback.

Financial support. This work was supported by research grants from and the National Library of Medicine at the National Institutes of Health (1R01LM01812-01, 5G08LM009776-02).

Potential conflicts of interest. All authors: No reported conflicts.

All authors have submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest. Conflicts that the editors consider relevant to the content of the manuscript have been disclosed.


1. Morse SS. Factors and determinants of disease emergence. Rev Sci Tech. 2004;23:443–51. [PubMed]
2. Weiss RA, McMichael AJ. Social and environmental risk factors in the emergence of infectious diseases. Nat Med. 2004;10:S70–6. [PubMed]
3. Schell CO, Reilly M, Rosling H, Peterson S, Ekström AM. Socioeconomic determinants of infant mortality: a worldwide study of 152 low-, middle-, and high-income countries. Scand J Public Health. 2007;35:288–97. [PubMed]
4. Link BG, Phelan J. Social conditions as fundamental causes of disease. J Health Soc Behav. 1995 Spec No:80–94. [PubMed]
5. Pearce N. Traditional epidemiology, modern epidemiology, and public health. Am J Public Health. 1996;86:678–83. [PubMed]
6. Biggs B, King L, Basu S, Stuckler D. Is wealthier always healthier? The impact of national income level, inequality, and poverty on public health in Latin America. Soc Sci Med. 2010;71:266–73. [PubMed]
7. Farmer P. Social inequalities and emerging infectious diseases. Emerg Infect Dis. 1996;2:259–69. [PMC free article] [PubMed]
8. Subramanian SV, Belli P, Kawachi I. The macroeconomic determinants of health. Annu Rev Public Health. 2002;23:287–302. [PubMed]
9. Cohen JM, Wilson ML, Aiello AE. Analysis of social epidemiology research on infectious diseases: historical patterns and future opportunities. J Epidemiol Community Health. 2007;61:1021–7. [PMC free article] [PubMed]
10. Currat LJ, Hyder AA, Nchinda TC, Carey-Bumgarner E. The 10/90 Report on Health Research 1999. In: Davey S, editor. Geneva, Switzerland: Global Forum for Health Research; 1999.
11. Chan EH, Brewer TF, Madoff LC, et al. Global capacity for emerging infectious disease detection. Proc Natl Acad Sci USA. 2010;107:21701–6. [PubMed]
12. Grosse RN, Auffrey C. Literacy and health status in developing countries. Annu Rev Publ Health. 1989;10:281–97. [PubMed]
13. Moss NE. Gender equity and socioeconomic inequality: a framework for the patterning of women's health. Soc Sci Med. 2002;54:649–61. [PubMed]
14. Alirol E, Getaz L, Stoll B, Chappuis F, Loutan L. Urbanisation and infectious diseases in a globalised world. Lancet Infect Dis. 2011;11:131–41. [PubMed]
15. Wagstaff A, Claeson M, Hecht RM, Gottret P, Fang Q. Millennium development goals for health: what will it take to accelerate progress? In: Jamison D, Breman J, Measham A, editors. Disease control priorities in developing countries. 2nd ed. Washington, DC: World Bank; 2006. [PubMed]
16. Okeke IN, Lamikanra A, Edelman R. Socioeconomic and behavioral factors leading to acquired bacterial resistance to antibiotics in developing countries. Emerg Infect Dis. 1999;5:18–27. [PMC free article] [PubMed]
17. Kekic L. The Economist Intelligence Unit's index of democracy. In: Franklin D, editor. The world in 2007: The Economist. 2007.
18. Klugman J, editor. Overcoming barriers: human mobility and development. New York: UNDP; 2009. Human development report 2009.
19. Anderson RM, May RM. Population biology of infectious diseases. Nature. 1979;280:361–7. [PubMed]
20. Jones KE, Patel NG, Levy MA, et al. Global trends in emerging infectious diseases. Nature. 2008;451:990–3. [PubMed]
21. Guernier V, Hochberg ME, Guégan J-F. Ecology drives the worldwide distribution of human diseases. PLoS Biol. 2004;2:e141. [PMC free article] [PubMed]
22. Honaker J, King G, Blackwell M. Amelia: Amelia II: a program for missing data. R package version 1.5. 2010. Available at Accessed 6 March 2011.
23. King G, Scheve K. Analyzing incomplete political science data: an alternative algorithm for multiple imputation. Amer Polit Sci Rev. 2001;95:49–69.
24. Imai K, King G, Lau O. Zelig: everyone's statistical software. 2009. Available at: Accessed 5 March 2011.
25. Weiss J. A Monte Carlo-based version of the Pearson chi-squared goodness of fit test. 2008. Available at: Accessed 16 March 2011.
26. Dunn RR, Davies TJ, Harris NC, Gavin MC. Global drivers of human pathogen richness and prevalence. Proc Biol Sci. 2010;277:2587–95. [PMC free article] [PubMed]
27. Hosseini P, Sokolow SH, Vandegrift KJ, Kilpatrick AM, Daszak P. Predictive power of air travel and socio-economic data for early pandemic spread. PloS One. 2010;5:e12763. [PMC free article] [PubMed]
28. Farag M, Nandakumar AK, Wallack SS, Gaumer G, Hodgkin D. Does funding from donors displace government spending for health in developing countries? Health Aff (Millwood) 2009;28:1045–55. [PubMed]
29. World Health Organization. Measles fact sheet, 2009. Available at: Accessed 8 August 2011.
30. Institute of Medicine; Emerging infections: microbial threats to health in the United States. In: Lederberg J, Shope RE, Oaks SC Jr, editors. Washington, DC: National Academies Press; 1992.
31. Ben-Shlomo Y. Real epidemiologists don't do ecological studies? Int J Epidemiol. 2005;34:1181–2. [PubMed]
32. Rothman KJ, Greenland S. Modern epidemiology. 2nd ed. New York: Lippincott Williams & Wilkins; 1998.
33. Rodgers GB. Income and inequality as determinants of mortality: an international cross-section analysis. 1979. Int J Epidemiol. 2002;31:533–8. [PubMed]

Articles from Clinical Infectious Diseases: An Official Publication of the Infectious Diseases Society of America are provided here courtesy of Oxford University Press