|Home | About | Journals | Submit | Contact Us | Français|
Although self-rated health is proposed for use in public health monitoring, previous reports on US levels and trends in self-rated health have shown ambiguous results. This study presents a comprehensive comparative analysis of responses to a common self-rated health question in 4 national surveys from 1971 to 2007: the National Health and Nutrition Examination Survey, Behavioral Risk Factor Surveillance System, National Health Interview Survey, and Current Population Survey. In addition to variation in the levels of self-rated health across surveys, striking discrepancies in time trends were observed. Whereas data from the Behavioral Risk Factor Surveillance System demonstrate that Americans were increasingly likely to report “fair” or “poor” health over the last decade, those from the Current Population Survey indicate the opposite trend. Subgroup analyses revealed that the greatest inconsistencies were among young respondents, Hispanics, and those without a high school education. Trends in “fair” or “poor” ratings were more inconsistent than trends in “excellent” ratings. The observed discrepancies elude simple explanations but suggest that self-rated health may be unsuitable for monitoring changes in population health over time. Analyses of socioeconomic disparities that use self-rated health may be particularly vulnerable to comparability problems, as inconsistencies are most pronounced among the lowest education group. More work is urgently needed on robust and comparable approaches to tracking population health.
Measures of health status are widely used in clinical trials and studies on quality of care (1, 2). There is also increasing interest in using health status measures to track changes in population health and health service needs and to monitor progress toward broad goals for the health of communities and nations (3–5). Interest in tracking population health extends to comparisons across countries and measurement of disparities within countries (6–10). Further, population health measures capturing nonfatal outcomes are essential to understanding how well public health and medical care systems are performing (11, 12).
Global measures of self-rated health, based on responses to a single survey question, have been proposed as reliable and valid measures of population health (13–15) and recommended for use in health monitoring by the US Centers for Disease Control, the World Health Organization, and the European Commission (4, 13, 16). The most commonly used survey item asks people to characterize their health as “excellent, very good, good, fair, or poor.” The resulting categorical responses are often dichotomized as “fair” or “poor” versus all other categories (7, 17–19). A recent Institute of Medicine report included the percentage of adults reporting “fair” or “poor” health among the set of 8 indicators recommended for tracking the progress of health in the United States (20).
At the individual level, self-rated health based on a single item has been found to be a strong predictor of health-care utilization, functional ability, and subsequent mortality, even after controlling for other measured indicators of health status and socioeconomic variables (21–26). Based on the strength of these associations, self-rated health has been used extensively in policy analyses as an overall measure of health outcomes (27–30). Despite its appeal as a simple measure with consistent predictive power in cohort studies, however, existing evidence on trends in self-rated health in the United States—where time series are available from multiple survey programs—points to inconsistent population-level patterns across data sources and studies. Zack et al. (31) analyzed self-rated health responses from the Behavioral Risk Factor Surveillance System and found worsening trends from 1993 to 2001. Analyses of trends in the National Health Interview Survey (32), on the other hand, indicate that self-rated health has remained relatively stable. Such discrepant findings based on responses to the same item in different nationally representative surveys raise questions about the validity of inferences about population health based on self-rated health.
To further understand the potential use of self-rated health for population-level monitoring, we present a comprehensive analysis of levels and trends in self-rated health responses in 4 separate nationally representative US surveys. In particular, we focus on characterizing discrepancies between surveys, comparing discrepancies in self-rated health with those in other types of questions, analyzing differences in specific subgroups, and considering possible explanations for inconsistencies across surveys.
We compared responses to a common survey item on self-rated health in 4 national US health surveys from 1971 to 2007: the National Health and Nutrition Examination Survey (NHANES), Behavioral Risk Factor Surveillance System (BRFSS), National Health Interview Survey (NHIS), and Current Population Survey (CPS). Table 1 provides a summary of the key characteristics of each survey.
NHANES comprises a series of cross-sectional surveys of the civilian, noninstitutionalized population aged 2 months or older (33). NHANES includes an in-person interview and a subsequent examination component, with both physical and laboratory measurements. The first 3 rounds were conducted at various intervals since 1970. Beginning in 1999, NHANES became a continuous survey with data released every 2 years.
BRFSS is an annual cross-sectional telephone survey started in 1984 (34). Currently, the survey is conducted by health departments in all 50 states and the District of Columbia by using a random-digit dialing method to obtain a state-representative sample of the civilian, noninstitutionalized population aged 18 years or more. The state samples can be combined to form a nationally representative sample.
NHIS is an annual cross-sectional household interview survey of the civilian, noninstitutionalized population, implemented since 1957 (35). The survey instrument is updated approximately every decade, with the last significant revision occurring in 1997. The current survey consists of a core questionnaire and supplementary material that may change each year.
CPS is a monthly nationally representative survey regarding the US labor force, including the noninstitutionalized population aged 16 years or more (36). The survey is conducted through both personal and telephone interviews, independently in each state. In the survey design, members of a household are interviewed for 4 months, left out of the sample the next 8 months, and interviewed again for the following 4 months. We restrict our analysis to the March supplement, which includes self-rated health.
Although NHIS and CPS elicit information on all household members from a single household respondent, we included only self-reports in our analyses. An anomaly in the 1998 CPS data set—the household respondent indicator is blank for 92% of the sample—makes it impossible to distinguish self-reports from proxy responses, so we have excluded these data from our analysis. Self-reports and proxy responses in CPS show minimal differences in levels and trends in all other years (Web Figure 1). (This is the first of 3 supplementary figures; each is referred to as “Web figure” in the text and is posted on the Journal’s website, http://aje.oxfordjournals.org/.)
In each survey, analyses were based on responses to the question, “Would you say your health in general is excellent, very good, good, fair, or poor?” (The wording was rearranged slightly in BRFSS, as, “Would you say that in general your health is….”) Respondents who answered “don't know/not sure” or refused to answer were excluded from the analysis (these respondents constituted less than 1% of the overall survey samples in every year and every survey).
Initial analyses were based on dichotomizing self-rated health responses as “fair” or “poor” versus all other categories, following common practice (7, 17–19). In further analyses, we compared this approach with a range of alternatives.
For comparison, we also examined responses to other questions common to the different surveys, including self-reported diabetes and body mass index computed from self-reported weight and height.
Age-standardized measures were computed on the basis of the 2000 US population by 5-year age intervals from 20 years to 70 years or older.
Sample weights were applied in each data set to account for unequal probabilities of selection, nonresponse, and noncoverage. The provided weights included ratio adjustments to match population distributions by age, sex, and race/ethnicity in each survey, except in some state samples from BRFSS, which matched only on age and sex. Variance estimation was undertaken by using Taylor-series linearization methods to account for complex survey designs including clustering, stratification, and unequal weights (37). For CPS, which does not include variables on stratification and clusters in the public-release data set, we developed synthetic design variables following the approach of Joliffe (38), based on resorting the data and assigning consecutive observations to synthetic clusters in a way that approximates the design effects in the actual CPS sample. Following Joliffe, we used cluster sizes of 4 housing units and sorted by household income to induce intracluster correlation in self-rated health, based on the underlying association between income and health.
For each survey year, we computed confidence intervals around age-standardized proportions using different response categories. We examined patterns by sex and by 3 broad age groups: 20–49 years, 50–64 years, and 65 years or older. We also examined differences by race/ethnicity and educational level. Race/ethnicity was categorized as non-Hispanic white, non-Hispanic black, Hispanic, and other. Education was categorized as less than high school, high school, and more than high school.
To assess trends over the last decade overall and by sex, age, race, and education, we fit logistic regression models relating the probability of “fair” or “poor” self-ratings to calendar year over the period 1998–2007. We did not fit models to NHANES, as data are available only for 2-year periods starting in 1999–2000. We excluded CPS data from 1998 because self-reports could not be distinguished from proxy responses in that year, as noted above. Separate models were fit for each subgroup within each survey. Analogous logistic regression models were fit to the probability of reporting “excellent” health.
All statistical analyses were undertaken by using Stata Release 10/SE (StataCorp LP, College Station, Texas).
In 2007, the age-standardized proportion of respondents reporting fair or poor health ranged from 12.0% (95% confidence interval (CI): 11.3, 12.7) in NHIS to 16.4% (95% CI: 15.9, 16.8) in BRFSS for males and from 13.5% (95% CI: 12.9, 14.1) in NHIS to 16.9% (95% CI: 16.6, 17.2) in BRFSS for females. NHANES estimates in 2005–2006 (the most recent available) were similar to those in BRFSS, but with greater uncertainty.
Trends in age-standardized probabilities of reporting fair or poor health are plotted in Figure 1. BRFSS shows increases of 15% among women and 22% among men in reports of fair or poor health over the period 1993–2007. NHIS, on the other hand, shows reductions in fair/poor health from 1982 to 1990, followed by slight increases for the next 2–3 years. Since 1993, changes in NHIS have been relatively modest, except for a sharp drop in 1997 coinciding with a major redesign of the survey, which preserved the exact wording but relocated the self-rated health question within the survey.
NHANES shows declines in fair/poor ratings from the first round of the survey (1971–1975) through the third round (1988–1994) in both sexes. In men, trends since 1999–2000 have been marked by rising reports of fair/poor health through 2003–2004, followed by a reversal of this pattern in 2005–2006; patterns for women have oscillated since 1999–2000. Finally, CPS indicates mostly steady reductions in fair/poor ratings for males since 1999 and flat trends for females over this period.
In order to consider whether differences among the surveys may apply more generally to other self-reported health-related items, we compared these results with trends in other variables. For example, Figure 2 presents results from NHIS, BRFSS, and NHANES for females on age-standardized proportions reporting diabetes, which are much more concordant than self-ratings of health. Results are similar for men (not shown). Figure 2 also presents a comparison of body mass index computed from self-reported weight and height. Although the levels and trends are similar in NHIS and BRFSS, estimates from NHANES are higher, by roughly the same increment in each year of comparison. Thus, in contrast to self-reported diabetes, self-reported body mass index appears subject to some systematic variation across surveys. Unlike self-rated health, however, the trend across surveys appears largely consistent despite variation in estimated levels.
Disaggregation by age, race/ethnicity, and education reveals more subtle patterns (Figure 3). In the youngest age group, CPS and NHIS show the lowest fractions of respondents reporting fair or poor health. Conversely, in the oldest age group, the fraction reporting fair or poor health is highest in CPS. Overall, the sharpest divergence in trends across surveys appears in ages 20–49 years, with the proportion reporting fair/poor health in 2007 around 50% higher in BRFSS compared with NHIS or CPS, in contrast to relatively modest differences in 1993. In older age groups, differences in levels are smaller across surveys, in relative terms, but variation in time trends remains.
Disaggregating by race and ethnicity, we observe the smallest inconsistencies among non-Hispanic African Americans and the largest among Hispanics. For Hispanic respondents, discrepancies among surveys have widened over time, with a nearly 2-fold difference in proportions reporting fair or poor health in NHIS versus BRFSS in 2007, compared with roughly equal proportions in the early 1990s. Levels and trends in the 4 surveys among non-Hispanic whites are moderately discrepant.
Disaggregating by educational level, the greatest discrepancies appear among those respondents without a high school diploma. The magnitudes of cross-survey differences in levels and trends between those with a high school diploma and those with at least some college are similar.
Although the poststratification weighting procedures in CPS, NHANES, and NHIS accounted for age, sex, and race/ethnicity, adjustment for race was incorporated in some states but not others in BRFSS (all states adjusted for age and sex). Education was not factored into the weights for any of the surveys. In our sample on self-rated health, we find some differences across surveys in the sample composition by race and education (Web Figure 2). Changes in these variables, however, are modest and gradual over the period of analysis, and cross-survey differences remain fairly constant over time, which suggests that discrepancies in self-rated health trends are not explained by differences in sample composition.
Although researchers typically dichotomize self-rated health as “fair” or “poor” versus all other responses, we considered whether alternative approaches may yield more consistent results. Figure 4 shows trends in the 4 surveys since 1998 based on 4 different dichotomous coding schemes. (Web Figure 3 also presents trends in the average self-rated score, coding “excellent” as 5, “very good” as 4, and so on, which indicate similar discrepancies across surveys as for “fair/poor” ratings.) The ordering of the different surveys in terms of the age-standardized responses is largely preserved across the different choices of dichotomous indicator, with NHIS producing the most favorable ratings, followed by CPS, BRFSS, and NHANES; the exception is the indicator of “poor” self-ratings, for which CPS is least favorable. Figure 4 suggests visually that the proportion of respondents rating themselves as “excellent” may yield more consistent trends across surveys than the standard choice of “fair/poor.” This possibility is evaluated formally in the statistical models described below.
For the 3 surveys with annual reporting (CPS, NHIS, BRFSS), we modeled time trends from 1998 to 2007 using logistic regression of self-rated health (with either “excellent” or “fair/poor” ratings as the dependent variable) as a function of calendar year. Separate models were fit for each survey, by subgroup. The estimated odds ratios for calendar year in the regressions were translated into average annual rates of change in the odds of reporting either “excellent” or “fair/poor” health. For example, an odds ratio of 1.02 on year implies an average annual rate of change of (1.02 − 1.00) × 100 = 2%. Figure 5 summarizes the regression results.
Overall, and in both men and women, the regressions confirm the observation that trends in “excellent” ratings are more consistent across surveys than trends in “fair/poor” ratings. In men, CPS shows significant declines in the proportion of fair/poor ratings, in contrast to the significant increases seen in BRFSS, whereas declines in excellent ratings are seen in all surveys, albeit at varying rates. Across age groups, significant differences appear in fair/poor ratings from the 2 younger age groups, while excellent ratings are less discrepant across surveys overall. Considering differences across race and ethnic groups, using either dichotomous measure, we found that the greatest discrepancies in trends appear among Hispanic respondents, especially in fair/poor responses. Finally, comparisons across education groups indicate that, for those respondents who have completed at least high school, trends are unambiguously worse: More people report “fair/poor” health at the same time that fewer people report “excellent” health. On the other hand, trends among those without a high school diploma offer the most ambiguous conclusions in any of the subgroup analyses: In terms of both the fair/poor and excellent responses, CPS points to a strong, significant favorable trend, whereas BRFSS shows a strong, significant unfavorable trend in this group.
In this study, we undertook a comprehensive comparative analysis of self-rated health in 4 nationally representative US surveys and observed widely discrepant results overall. In addition to variation across surveys in self-rated health levels, we also noted striking inconsistencies in trends. Whereas BRFSS finds that Americans were increasingly likely to report “fair” or “poor” health over the last decade, CPS indicates the opposite trend. Unpacking these discrepancies through subgroup analyses reveals the greatest inconsistencies in trends among younger respondents, Hispanics, and those without a high school education. Our results also challenge the standard practice of focusing on the percentage of respondents with self-ratings of “fair” or “poor,” as this indicator appears prone to greater cross-survey discrepancies than other indicators constructed from the same survey responses, such as the proportion with “excellent” self-ratings.
Wide variations in levels and trends in self-rated health measured in nationally representative surveys using the same survey item demand an explanation. There are at least 3 possibilities. First, despite national sample frames and application of sample weights, the aggregated results from some surveys may not adequately reflect the national average. For example, concerns have been raised in the past about possible noncoverage and nonresponse bias in telephone surveys such as the BRFSS. Recent work, however, has indicated that the bias produced by nonresponse in random-digit telephone surveys is probably modest (39, 40). Although we observed some differences in the demographic composition of the weighted samples in the 4 surveys, these differences were stable over time and therefore cannot explain divergent time trends in self-rated health. Moreover, the consistent trends across surveys observed in other measures, such as diabetes prevalence, mirror a previous finding of consistent cross-sectional estimates in NHIS and BRFSS for 13 of the 14 different health measures examined—with self-rated health being the notable exception (41).
Second, the differences in results across survey platforms may signify a survey mode effect particular to self-rated health. The potential importance of different modes of administration has been noted previously for other specific types of questions, and indeed we observe significant differences across surveys in reported body mass index levels. In order to attribute divergent time trends across the survey platforms to mode effects for self-rated health, however, the mode effects need to be acting differentially over time. In contrast to the body mass index example of parallel time trends across surveys, responses on self-rated health are evidently growing more discrepant over time. We are not aware of any existing studies that account for mode-item effects that change over time in such divergent manners.
A third possibility is that there may be framing and ordering effects in the different questionnaires that interact with attributes of the respondents, so that biases across platforms are shifting. It is difficult to construct more precise hypotheses regarding the nature of the individual and population attributes that would progressively change framing and ordering effects over time. The major shift in NHIS responses in 1997, accompanying a relocation of the self-rated health item within the overall structure of the interview, indicates that ordering effects for the self-rated health item can be large. Cross-survey differences in the steady changes in responses over time would require a more subtle form of framing or ordering effect. These effects might derive, for example, from some changing cultural or linguistic attributes of individual respondents. The widening inconsistencies in trends among Hispanic respondents offer some evidence in favor of the potential importance of cultural or linguistic factors, but more definitive conclusions await further qualitative and quantitative investigation.
Comparing trends by educational level, we find that discrepancies across surveys are most pronounced among respondents without a high school diploma. This finding has potentially profound implications for analyses of socioeconomic disparities in health that rely on self-rated health responses. Trends in CPS show improvements among lower-educated respondents at the same time that self-ratings are worsening among more educated respondents—which has the net effect overall of reducing disparities across education groups. In contrast, BRFSS shows the sharpest declines in health among the least educated group, which implies a widening gap across socioeconomic strata.
Although our analysis of existing survey programs cannot provide a clear indication of the causes of incomparabilities across surveys and over time, it nevertheless offers an important reminder that, at the present time, substantial caution is warranted in using self-rated health to monitor trends in population health. One concrete suggestion that emerges from our study is to reconsider the standard approach of dichotomizing self-rated health as “fair/poor” versus other responses. Although some recent studies have examined the continuity of self-rated health and found evidence of symmetry in responses at the positive and negative ends of the scale (42, 43), our study indicates that trends in self-reported excellent health appear less prone to inconsistencies across surveys than trends in self-reported fair/poor health. This finding challenges the prevailing approach to using this variable in empirical studies in public health, epidemiology, and medical sociology.
Given the importance of tracking nonfatal health outcomes at the population level, what are the available options for refining these tools for future use? Two main avenues have been pursued to date. First, there has been a steady evolution of more detailed instruments that either ask multiple questions about general health (44, 45) or ask about more specific domains of health or symptoms (46–48). Population-level data for these instruments are not yet available for long periods of time or from multiple sources in the same country to test if they suffer from similar problems. Recent efforts to understand relations across various multiitem health measurement scales have characterized differences across instruments in cross-sectional analyses (49–51), but extension of these analyses to compare time trends requires further longitudinal study. Second, strategies such as anchoring vignettes (52, 53) have been proposed recently to enhance the comparability of self-reported survey responses in health and other areas. It is not yet known whether such strategies can successfully remedy the bulk of comparability problems across settings or over time.
The epidemiologic transition has advanced far enough (54–56) that, for most countries, critical questions regarding the population's health encompass not only how long people live but also their experience of health while they are alive. Although self-rated health continues to appeal as a health measure that contributes unique information on individuals’ perceptions of their own health and has strong predictive power for future outcomes, our study suggests that self-rated health may not be suitable for tracking changes in population health over time. In seeking to identify efficient measurement strategies for this latter purpose, more development work on new robust and comparable approaches is urgently needed.
Author affiliations: Department of Global Health and Population, Harvard School of Public Health, Boston, Massachusetts (Joshua A. Salomon); Harvard Initiative for Global Health, Cambridge, Massachusetts (Joshua A. Salomon, Shefali Oza, Stella Nordhagen); and Institute for Health Metrics and Evaluation, University of Washington, Seattle, Washington (Stella Nordhagen, Christopher J. L. Murray).
This work was supported in part by the National Institute on Aging (grant P01AG17625).
The authors gratefully acknowledge helpful discussions with Ali Mokdad, Dean Joliffe, Linda Martin, and Yael Benyamini.
The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Conflict of interest: none declared.