Search tips
Search criteria 


Logo of pubhealthrepPublic Health Reports
Public Health Rep. 2008 May-Jun; 123(3): 260.
PMCID: PMC2289974

Implications of Missing Income Data

We strongly support the recommendation of the recent article by Kim et al.1 that health researchers pay heed to the strong social patterning of missing data often exhibited by key variables in epidemiologic studies. Yet, although we agree with the authors' advice that researchers should “at a minimum, carefully examine characteristics of respondents with missing income information,” we must strongly caution readers against their recommendation to “routinely includ[e] a separate income category of respondents with missing income information in all analyses.”

Contradicting this recommendation is extensive literature on missing data,27 including two articles cited by Kim et al.2,4 This research has shown that this “missing indicator” method will result in biased effect estimates under most conditions. In particular, bias occurs even when the missing data are missing completely at random (e.g., people missing data on income are a random sample from all income groups). If the data are missing at random (e.g., missingness on income depends only on variables that are observed), then multiple imputation or weighting techniques can be used to obtain valid effect estimates.6 If, however, the data are not missing at random (e.g., there are additional unobserved predictors of missingness), more complex models for non-ignorable nonresponse are required,7 and, ultimately, investigators would do well to adopt a sensitivity analysis framework. In all cases, researchers must give serious thought to data quality issues raised by the extent and patterning of missingness, the pathways leading to this missingness, and the implications for valid causal inference.

In summary, the problem of the social patterning of missing data that Kim et al. highlight is very real and troubling for epidemiologic research and underscores why epidemiologists cannot afford to ignore poverty and its impact on both health status and causal inference.8 To do the research right, we must use appropriate methods. In 1995, Greenland and Finkle2 were alarmed to find that “the indicator method [was] widely perceived as a formally correct method of handling missing values.” Because this view continues to persist in some quarters, we must reiterate their recommendation that epidemiologists avoid the “potentially disastrous ad hoc” missing indicator approach.


1. Kim S, Egerter S, Cubbin C, Takahashi ER, Braveman P. Potential implications of missing income data in population-based surveys: an example from a postpartum survey in California. Public Health Rep. 2007;122:753–63. [PMC free article] [PubMed]
2. Greenland S, Finkle WD. A critical look at methods for handling missing covariates in epidemiologic regression analyses. Am J Epidemiol. 1995;142:1255–64. [PubMed]
3. Jones MP. Indicator and stratification methods for missing explanatory variables in multiple linear regression. J Am Stat Assn. 1996;91:222–30.
4. Vach W, Blettner M. Biased estimation of the odds ratio in case-control studies due to use of ad hoc methods of correcting for missing values for confounding variables. Am J Epidemiol. 1991;134:895–907. [PubMed]
5. Horton NJ, Kleinman KP. Much ado about nothing: a comparison of missing data methods and software to fit incomplete data regression models. Am Stat. 2007;61:79–90. [PMC free article] [PubMed]
6. Carpenter JR, Kenward MG, Vansteelandt S. A comparison of multiple imputation and doubly robust estimation for analyses with missing data. J Royal Stat Society Series A. 2006;169:571–84.
7. Schafer JL. Analysis of incomplete multivariate data. Boca Raton (FL): Chapman and Hall; 1997.
8. Krieger N. Why epidemiologists cannot afford to ignore poverty. Epidemiology. 2007;18:658–63. [PubMed]

Articles from Public Health Reports are provided here courtesy of SAGE Publications