Search tips
Search criteria 


Logo of pubhealthrepPublic Health Reports
Public Health Rep. 2010 Mar-Apr; 125(2): 160–167.
PMCID: PMC2821842

How Healthy Could a State Be?

David Kindig, MD, PhD,a Paul Peppard, PhD,a and Bridget Booske, PhDa



We predicted the amount of health outcome improvement any state might achieve if it could reach the highest level of key health determinants any individual state has already achieved.


Using secondary county-level data on modifiable and nonmodifiable health determinants from 1994 to 2003, we used regression analysis to predict state age-adjusted mortality rates in 2000 for those younger than age 75, under the scenario of each state's “ideal” predicted mortality if that state had the best observed level among all states of modifiable determinants.


We found considerable variation in predicted improvement across the states. The state with the lowest baseline mortality, New Hampshire, was predicted to improve by 23% to a mortality rate of 250 per 100,000 population if New Hampshire had the most favorable profile of modifiable health determinants. However, West Virginia, with a much higher baseline, would be predicted to improve the most—by 46% to 254 per 100,000 population. Individual states varied in the pattern of specific modifiable variables associated with their predicted improvement.


The results support the contention that health improvement requires investment in three major categories: health care, behavioral change, and socioeconomic factors. Different states will require different investment portfolios depending on their pattern of modifiable and nonmodifiable determinants.

It is well understood that there is substantial variation in health outcomes, however measured, across the 50 U.S. states. It is also becoming more widely appreciated that improving such outcomes is the product of the multiple determinants of health, including genetics, health care, individual behavior, and the social and physical environment.1

Significant previous work has been conducted on avoidable mortality, both in the U.S. and Europe,2,3 including a recent report by Weisz and colleagues comparing France, England, Wales, and the U.S.4 Unfortunately, research has not yet provided adequate guidance to policy makers with regard to specific investment choices across these categories to improve population health. Public health advocates have often based policy advocacy on early estimates from the Centers for Disease Control and Prevention, which indicated that about 40% of deaths are caused by behavioral factors, 30% by genetics, 15% by social circumstances, 10% by medical care, and 5% by physical environmental exposures.5 The widely recognized America's Health Rankings has four determinant categories with weights currently assigned as follows by an expert panel: 36% personal behaviors, 25% community environment, 18% public and health policies, and 21% clinical care.6

Some investigators have examined single determinants; for example, Cutler has recently assigned a 50% weight to medical care, while also conducting sensitivity analysis from 25% to 75%.7 In contrast, Woolf and colleagues have estimated that correcting disparities in education-associated mortality rates would have averted eight times more deaths than those attributable to medical advances between 1996 and 2002.8 Looking at two determinant categories, using longitudinal data from the Americans' Changing Lives survey, Lantz et al. found that four common health risk behaviors (smoking, physical activity, alcohol consumption, and body mass index) had only modest impact in predicting functional status and self-rated health in low-income populations after controlling for socioeconomic factors. They concluded that “Risk behaviors are not the dominating mediating mechanism for socioeconomic health differences.”9 In addition, a recent examination of 22 European countries found that the variation in health inequalities could be attributed to variations in smoking, alcohol consumption, and access to care, but that the patterns of determinants of inequality differed by gender, country, and outcome measured.10

Limitations of datasets and adequate methods make difficult such policy-oriented population health analysis even across such broad determinant categories and, even more so, across specific programs and policies. However, it is well known that substantial variation in these determinant categories is found across the 50 U.S. states,6,11 and that even the most healthy states do not have the best level of each health determinant that any state has already achieved. It is, therefore, unlikely that any state has yet achieved the most healthy population that it could. We estimated the feasible range of mortality improvement that might be possible for any state if it could achieve for each determinant what any state has already achieved.


We sought to predict the lowest possible mortality rate that states might expect to achieve if they obtained the best levels of health determinants observed among all states. To do this, we compiled county-level data (Figure 1). Our dependent variable was based on an age-adjusted <75 years mortality rate (per 100,000 population) for 2001–2003. For our predictor variables, we compiled county-level data from multiple sources for years prior to 2001–2003 on numerous characteristics, including health-related medical care (e.g., uninsured rate), sociodemographics (e.g., high school graduation rate), and behavior (e.g., smoking prevalence). Candidate data elements had to (1) be available for most counties in the U.S., (2) be collected in a similar fashion in each county, and (3) have a hypothesized relationship to mortality outcomes. A characteristic was deemed modifiable if potentially amenable to program or policy intervention, but nonmodifiable if it was not (e.g., racial composition of a county was not deemed modifiable while smoking rates were). Because of data limitations, 121 counties and the state of Alaska were eliminated from the model.

Figure 1.
Description of sources and years of data for dependent and independent variables used to predict state age-adjusted mortality rates in 2000

We then developed a parsimonious model predicting county-level age-adjusted mortality rates for those younger than 75 years of age and used the model to predict states' mortality rates under different scenarios. The primary scenario of interest was each state's “ideal” predicted mortality if that state had the best observed level (among all states) of modifiable characteristics. Note that in this approach we used county-level ecologic data to develop a model predicting counties' mortality rates, but then applied that model to states. This was done because the 3,017 counties provided a more powerful data source than the states and allowed for a more detailed examination of possible models. The examination of many predictor variables and their higher-order terms (including, for example, squared terms and interactions) would not have been possible using states' mortality rates as the outcome variable.

For the model, we identified clusters of highly correlated predictor variables (correlation coefficient ≥0.8). Within a set of highly correlated variables, variables that were most related to mortality were retained for possible inclusion in the final model. We excluded the remaining variables from the final model to avoid the problem of multicollinearity. We included all remaining predictor variables (including squared terms) in a “full” multiple linear regression model and inspected them for their association with mortality. If squared terms were not significant, they were dropped and first-order terms were examined. We retained all statistically significant first-order and squared terms (and corresponding first-order terms, regardless of significance) in the final model. Variables for percent of population aged ≥65 years and percent of population female were retained in the final model regardless of their statistical significance, so that the county comparisons would be adjusted by age and gender. (Note, though, that mortality rates were also age-adjusted.)

We also examined interactions among select predictor variables for statistical significance and performed modeling diagnostics, including the examination of residual plots and assessment of influence of individual data points. In all models, we weighted counties by their population. Finally, we entered states' observed and “ideal” levels of each characteristic into the model to create state-specific predicted mortality rates under their prevailing (actual) circumstances (i.e., the usual predicted values from a linear model) and under ideal circumstances. Ideal circumstances were modeled by replacing states' observed predictor variables with the best observed values among all states for those variables identified as modifiable.

To obtain an estimate of the relative amount of improvement in mortality rate that might be realized by a state improving a specific modifiable factor from a state's current level to that of the best value attained among all states, we performed the following computation steps:

  1. We calculated the difference between each state's predicted mortality from our final regression model (using states' observed modifiable and nonmodifiable predictor variables).
  2. We subtracted each state's estimated best attainable mortality (assuming each state had the best observed value of all modifiable factors) from the predicted mortality. This provided an estimate, under the model, of how much a state might be able to improve (i.e., 100% of possible improvement).
  3. Starting with each state's observed values of predictor variables, we estimated the improvement in mortality when we input the best observed value of each modifiable factor, one at a time, separately for each state.
  4. The improvement calculated in step three was taken as a percentage of the 100% possible improvement in mortality rate calculated in step two.

Taking Alabama as an example, our method found the following: (1) Alabama's predicted mortality under the final model was 486.6 deaths per 100,000 population (as compared with an observed mortality of 491.2 per 100,000 population); (2) the difference between Alabama's predicted mortality and mortality estimated if Alabama had the best level of modifiable factors among all states was 486.6 – 302.4 = 184.2 per 100,000 population; (3) if Alabama dropped its smoking prevalence from 24% (observed) to 13% (best prevalence among all states was Utah), the estimated mortality under the model would be 469.3 per 100,000 population (i.e., a reduction of 17.3 per 100,000 population, based on 486.6 – 469.3); and (4) the estimated reduction of 17.3 per 100,000 population is 9.4% (17.3/184.2) of the total reduction possible calculated in step two.


Table 1 shows the results of this analysis. The final ecologic model predicting counties' mortality rates (n=3,017 counties) had an R2 value of 0.87, indicating a very high level of prediction of county mortality with the 25 retained first- and second-order (squared) terms. The second-to-last column shows the change in the number of deaths of those younger than 75 years of age per 100,000 population for a 1% prevalence increase of each predictor variable (or $1,000 increase in median household income). Note that the prevalences had varying ranges (comparing states' minimum and maximum prevalences among variables), so that a 1% increment for employment rate (range: 4% to 6%) was relatively more pronounced than a 1% increment in prevalence of college graduates (range: 15% to 30%).

Table 1.
Regression analysis of modifiable and nonmodifiable variables on county deaths per 100,000 population, U.S., 2001–2003

Among the nonmodifiable variables, the percentage of native-born people had the largest relative association with mortality, with an increment of 3.9 deaths per 100,000 population associated with a 1% greater prevalence. The county percentage of females, African Americans, and American Indians also predicted higher age-adjusted mortality rates, while the percentage of Pacific Islanders and residents older than 65 years of age predicted lower death rates per 100,000 population. Among the modifiable variables, the percentage of the population that was uninsured had the largest impact on the predicted mortality rate, with a 1% increase in the uninsured being associated with an 8% increase in mortality rate. Three socioeconomic variables (high school graduation, college graduation, and median family income) were associated with lower mortality rates, while the percentage living alone and percentage unemployed were associated with higher death rates per 100,000 population. In the behavior category, both higher smoking and inactivity rates were associated with higher mortality rates.

Figure 2 and Table 2 display the results for each state. Figure 2 indicates the percent improvement in deaths per 100,000 population predicted for each state if each state had the highest (most favorable) level of each modifiable determinant observed among all states (plotted against states' actual, observed mortality rates). We found a positive correlation (R2=0.72) between the degree of observed mortality rate of a state and what was predicted by this model.

Figure 2.
Relationship between state baseline mortality (U.S., 2001–2003) and state mortality predicted by variables in the best scenario modela
Table 2.
Percent improvement in state-level deaths per 100,000 population (U.S., 2001–2003) predicted by three of the modifiable variables in the regression analysisa

For example, a state with a low baseline mortality, Utah has a baseline rate of 316 per 100,000 population, but the model predicts a rate of 263 per 100,000 population—a 16% decrease if Utah had the most favorable profile of modifiable health determinants. In Massachusetts, with a similarly low baseline rate of 319 per 100,000 population, predicted mortality would decrease by 22% to 229 per 100,000 population. On the other hand, a state with a high baseline mortality rate (West Virginia with 458 per 100,000 population) is predicted to decrease by 44% to 254 per 100,000 population. South Carolina has a similar high baseline rate of 478 per 100,000 population, but the model predicts a still substantial but lower 35% improvement to 309 per 100,000 population. At the same time, Figure 2 shows pairs of states such as Colorado/California and South Carolina/West Virginia with similar baselines but much different predicted improvement. For the country as a whole, the model predicts an overall improvement of 135 deaths per 100,000 population.

We also examined the relative amount of improvement in mortality rate that might be realized by a state improving a specific modifiable factor from a state's current level to that of the best value attained among all states. Table 2 also displays the variation seen from three of the modifiable variables for all states. There was considerable variation across the states in the relative contribution from each modifiable determinant. For example, Utah had most of its reduction predicted to be from reducing the uninsured rate but nothing from smoking rates (as it already had the best smoking rate among all states). West Virginia had a relatively low percentage predicted from the uninsured but greater reduction associated with increased education rates.


The main finding of this study was that, using the assumption that states could improve their level of modifiable variables to the best any state has achieved, considerable variation exists in how much potential improvement the model predicts across states. In general, the healthier states can improve less because the more healthy states have higher baseline levels of potentially modifiable determinants and more favorable baseline profiles of unmodifiable determinants. While this may seem obvious, we believe this method of examining improvement potential may be more helpful to policy makers who are concerned about health improvement in their states. The fact that even the healthiest states (e.g., New Hampshire and Vermont) do not have the highest levels of every determinant that any state has already achieved is important to recognize. At the same time, pairs of states (e.g., Colorado/California and South Carolina/West Virginia) with similar baselines but much different predicted improvement may be instructive in terms of the different possible paths to improvement.


This study was subject to several limitations. The major limitation of this study was the use of ecologic data to derive associations that might indicate causal relationships. The units of analysis for this study were counties and, thus, interpretations of specific associations between predictor variables and mortality should be made with caution. The “ecologic fallacy”—an assumption that associations among variables assessed from aggregate data apply to analogous individual-level variables—could easily lead to misinterpretation of the model. On the other hand, ecologic analysis is often the relevant method of choice when the unit of analysis is a geographically defined area (e.g., a county within a state) where public health action is being considered.12,13 In addition, it is likely that intervening in one of the modifiable variables may also change the prevalence of other variables, resulting in a lower estimate of effect. However, the model is conservative in that we limited ourselves to the highest level of any modifiable variable that any state has already achieved. In the future, several variables (e.g., high school graduation and smoking rates) are likely to reach levels higher than any state has so far.

Other limitations included limiting outcomes to mortality rates. One would expect different relationships using health-related quality of life indicators or disparity measures; further work should examine such potential outcomes.14 In addition, even though the dates of the determinant data preceded the period of the mortality measures, many of them have a latency—even long latency—in producing outcomes. Future work should attempt to appropriately lag as many determinant variables during the life course as possible. Finally, no adjustment was made or was even possible for the impact of migration on the outcome, leaving the impression that the outcome associations were with local determinants, when in fact some of them were the result of exposures in other places.


Given the model's analytic limitations, we are reluctant to draw precise conclusions from the relative contributions of each modifiable determinant. The directions of the relationships were all expected based on theory and other empirical work. Of note, however, was the larger independent association of the socioeconomic factors than the behavioral determinants, which is consistent with previous work by Lantz et al. and others.9,15 This finding could be due in part to the greater reliability of socioeconomic variables from the U.S. Census than the multiple years of Behavioral Risk Factor Surveillance System data. We were surprised by the magnitude of the living alone variable, although the direction of the association was consistent with previous work16 and with the similar variable “divorce” used in a previous analysis.17 Among the unmodifiable variables, percent native-born was a strong negative predictor, with its direction also supported by other empirical work.18

Given the limitations of this analysis for drawing causal relationships, it is critical that we develop and use better datasets and analytic models to further tease apart the relationships discussed in this article. The complexity of establishing these relationships in population health has led Stoddart to call such work a “fantasy equation,” meaning that “at present we but vaguely understand the relative magnitude of the coefficients on the independent variables that would inform specific policies rather than broad directions, even if we are beginning to see the variables themselves more clearly.”19 Of course, these relationships do exist and are waiting to be discovered. Policy makers seldom require firm causal relationships for the decisions that are made in the public or private sector,20 so the job of population health research is to get as close to causal understanding as is possible to guide political or managerial efforts. We believe our attempt to begin to quantify these relationships is helpful in this regard and will hopefully prompt additional work with other datasets and methods.

Even though these results fall short of specific guidance on cross-sectoral policies, they certainly support the contention that health improvement requires investment in all three major categories of health care, behavioral change, and socioeconomic factors. While population health scholars understand this notion, many in policy positions do not, especially as it relates to socioeconomic determinants. It is not a fantasy to understand that major improvements in health outcomes can be made by combining interventions in multiple sectors that have already been achieved in other jurisdictions. As relationships such as those described in this article gain better causal certainty, they should be of substantial guidance to policy makers in the public and private sectors as they attempt the most cost-effective improvement for those for whom they have responsibility.21


Partial support for this study was obtained from the Robert Wood Johnson Health and Society Scholars Program at the University of Wisconsin, as well as from the Wisconsin Partnership Program's Making Wisconsin the Healthiest State project. Neither funder was involved in the design and conduct of the study; collection, management, analysis, and interpretation of the data; or preparation, review, and approval of the article. The authors acknowledge the conceptual contributions of Javier Nieto, Pat Remington, and Sheryl Magzamen.


1. Evans RG, Stoddart GL. Consuming health care, producing health. Soc Sci Med. 1990;33:1347–63. [PubMed]
2. Holland WW, editor. 3rd ed. Oxford: Oxford University Press; 1997. European community atlas of avoidable death 1985–89.
3. Hahn RA, Teutsch SM, Rothenberg RB, Marks JS. Excess deaths from nine chronic diseases in the United States, 1986. JAMA. 1990;264:2654–9. [PubMed]
4. Weisz D, Gusmano MK, Rodwin VG, Neuberg LG. Population health and the health system: a comparative analysis of avoidable mortality in three nations and their world cities. Eur J Public Health. 2008;18:166–72. [PubMed]
5. McGinnis JM, Williams-Russo P, Knickman JR. The case for more active policy attention to health promotion. Health Aff (Millwood) 2002;21:78–93. [PubMed]
6. United Health Foundation. America's health rankings: call to action for people and their communities. Minnetonka (MN): United Health Foundation; 2007.
7. Cutler DM, Rosen AB, Vijan S. The value of medical spending in the United States. N Engl J Med. 2006;355:920–7. 1960–2000. [PubMed]
8. Woolf SH, Johnson RE, Phillips RL, Jr, Philipsen M. Giving everyone the health of the educated: an examination of whether social change would save more lives than medical advances. Am J Public Health. 2007;97:679–83. [PubMed]
9. Lantz PM, Lynch JW, House JS, Lepkowski JM, Mero RP, Musick MA, et al. Socioeconomic disparities in health change in a longitudinal study of US adults: the role of health-risk behaviors. Soc Sci Med. 2001;53:29–40. [PubMed]
10. MackenBach JP, Stirbu I, Roskam AJ, Schaap MM, Menvielle G, Leinsalu M, et al. Socioeconomic inequalities in health in 22 European countries. N Engl J Med. 2008;358:2468–81. [published erratum appears in N Engl J Med 2008;359:e14] [PubMed]
11. Department of Health and Human Services (US) Health, United States, 2007: with chartbook on trends in the health of Americans. Rockville (MD): National Center for Health Statistics; 2008.
12. Morgenstern H. Ecologic studies in epidemiology: concepts, principles, and methods. Annu Rev Public Health. 1995;16:61–81. [PubMed]
13. Szklo M, Nieto FJ. 2nd ed. Sudbury (MA): Jones and Bartlett; 2006. Epidemiology: beyond the basics.
14. Kindig DA, Asada Y, Booske B. A population health framework for setting national and state health goals. JAMA. 2008;299:2081–3. [PubMed]
15. Lantz PM, House JS, Lepkowski JM, Williams DR, Mero RP, Chen J. Socioeconomic factors, health behaviors, and mortality: results from a nationally representative prospective study of US adults. JAMA. 1998;279:1703–8. [PubMed]
16. Koskinen S, Joutsenniemi K, Martelin T, Martikainen P. Mortality differences according to living arrangements. Int J Epidemiol. 2007;36:1255–64. [PubMed]
17. Kindig DA, Seplaki CL, Libby DL. Death rate variation in US subpopulations. Bull World Health Organ. 2002;80:9–15. [PubMed]
18. Singh GK, Siahpush M. Ethnic-immigrant differentials in health behaviors, morbidity, and cause-specific mortality in the United States: an analysis of two national databases. Hum Biol. 2002;74:83–109. [PubMed]
19. Stoddart G. Hamilton (ON): Centre for Health Economics and Policy Analysis, McMaster University; 1995. The challenge of producing health in modern economies.
20. Fox DM. The determinants of policy for population health. Health Econ Policy Law. 2006;1:395–407. [PubMed]
21. Kindig DA. A pay-for-population health performance system. JAMA. 2006;296:3–2611. [PubMed]

Articles from Public Health Reports are provided here courtesy of SAGE Publications