We sought to predict the lowest possible mortality rate that states might expect to achieve if they obtained the best levels of health determinants observed among all states. To do this, we compiled county-level data (). Our dependent variable was based on an age-adjusted <75 years mortality rate (per 100,000 population) for 2001–2003. For our predictor variables, we compiled county-level data from multiple sources for years prior to 2001–2003 on numerous characteristics, including health-related medical care (e.g., uninsured rate), sociodemographics (e.g., high school graduation rate), and behavior (e.g., smoking prevalence). Candidate data elements had to (1) be available for most counties in the U.S., (2) be collected in a similar fashion in each county, and (3) have a hypothesized relationship to mortality outcomes. A characteristic was deemed modifiable if potentially amenable to program or policy intervention, but nonmodifiable if it was not (e.g., racial composition of a county was not deemed modifiable while smoking rates were). Because of data limitations, 121 counties and the state of Alaska were eliminated from the model.
| Figure 1.Description of sources and years of data for dependent and independent variables used to predict state age-adjusted mortality rates in 2000 |
We then developed a parsimonious model predicting county-level age-adjusted mortality rates for those younger than 75 years of age and used the model to predict states' mortality rates under different scenarios. The primary scenario of interest was each state's “ideal” predicted mortality if that state had the best observed level (among all states) of modifiable characteristics. Note that in this approach we used county-level ecologic data to develop a model predicting counties' mortality rates, but then applied that model to states. This was done because the 3,017 counties provided a more powerful data source than the states and allowed for a more detailed examination of possible models. The examination of many predictor variables and their higher-order terms (including, for example, squared terms and interactions) would not have been possible using states' mortality rates as the outcome variable.
For the model, we identified clusters of highly correlated predictor variables (correlation coefficient ≥0.8). Within a set of highly correlated variables, variables that were most related to mortality were retained for possible inclusion in the final model. We excluded the remaining variables from the final model to avoid the problem of multicollinearity. We included all remaining predictor variables (including squared terms) in a “full” multiple linear regression model and inspected them for their association with mortality. If squared terms were not significant, they were dropped and first-order terms were examined. We retained all statistically significant first-order and squared terms (and corresponding first-order terms, regardless of significance) in the final model. Variables for percent of population aged ≥65 years and percent of population female were retained in the final model regardless of their statistical significance, so that the county comparisons would be adjusted by age and gender. (Note, though, that mortality rates were also age-adjusted.)
We also examined interactions among select predictor variables for statistical significance and performed modeling diagnostics, including the examination of residual plots and assessment of influence of individual data points. In all models, we weighted counties by their population. Finally, we entered states' observed and “ideal” levels of each characteristic into the model to create state-specific predicted mortality rates under their prevailing (actual) circumstances (i.e., the usual predicted values from a linear model) and under ideal circumstances. Ideal circumstances were modeled by replacing states' observed predictor variables with the best observed values among all states for those variables identified as modifiable.
To obtain an estimate of the relative amount of improvement in mortality rate that might be realized by a state improving a specific modifiable factor from a state's current level to that of the best value attained among all states, we performed the following computation steps:
- We calculated the difference between each state's predicted mortality from our final regression model (using states' observed modifiable and nonmodifiable predictor variables).
- We subtracted each state's estimated best attainable mortality (assuming each state had the best observed value of all modifiable factors) from the predicted mortality. This provided an estimate, under the model, of how much a state might be able to improve (i.e., 100% of possible improvement).
- Starting with each state's observed values of predictor variables, we estimated the improvement in mortality when we input the best observed value of each modifiable factor, one at a time, separately for each state.
- The improvement calculated in step three was taken as a percentage of the 100% possible improvement in mortality rate calculated in step two.
Taking Alabama as an example, our method found the following: (1) Alabama's predicted mortality under the final model was 486.6 deaths per 100,000 population (as compared with an observed mortality of 491.2 per 100,000 population); (2) the difference between Alabama's predicted mortality and mortality estimated if Alabama had the best level of modifiable factors among all states was 486.6 – 302.4 = 184.2 per 100,000 population; (3) if Alabama dropped its smoking prevalence from 24% (observed) to 13% (best prevalence among all states was Utah), the estimated mortality under the model would be 469.3 per 100,000 population (i.e., a reduction of 17.3 per 100,000 population, based on 486.6 – 469.3); and (4) the estimated reduction of 17.3 per 100,000 population is 9.4% (17.3/184.2) of the total reduction possible calculated in step two.