|Home | About | Journals | Submit | Contact Us | Français|
This study evaluated the extent to which the causes of variation in health care costs differ by the level at which observations are made.
More than 40 U.S. and international studies providing empirical estimates of the sources of variation in health care costs were reviewed and arrayed by size of observational units. A simplified graphical analysis demonstrating how estimated correlation coefficients change with the level and type of aggregation is presented.
As the unit of observation becomes larger, association between health care costs and health status/morbidity becomes weaker and smaller in magnitude, while correlation with income (per capita GDP) becomes stronger and larger. Individual expenditure variation within a particular health care system is largely due to differences in health status, but across systems, morbidity has almost no effect on costs. For nations, differences in per capita income explain over 90 percent of the variation in both time series and cross section.
Units of observation used for analysis of health care costs must be matched to the units at which decision making occurs. The observed pattern of empirical results is consistent with a multilevel allocative model incorporating aggregate capacity constraints. To the extent that macro constraints determine total budgets at the national level, policy interventions at the micro level (substitution of generic pharmaceuticals, use of CEA for allocation of treatments, controls on construction and technology, etc.) can act to improve efficiency, equity and average health status, but will not usually reduce aggregate average per capita costs of medical care.
Studies of health care cost at the national level routinely demonstrate that cost rises with increases in per capita income. Belgium spends more on medical care than Bangladesh because it is a wealthier country. The causality is so obvious that it is rarely mentioned. However, this linkage between income and expenditures is often not found in studies of individual health care costs. Indeed, some studies even show spending on medical care to be higher, not lower, for the poorest persons within a country (Wagstaff, van Doorslaer, and Paci 1991). This paper addresses why the disparity between individual and national correlations occurs, and related issues concerning the analysis of costs and policy decisions. A model of expenditure determination and allocation in two levels (macro and micro) is presented. Results from 40 empirical studies are then reviewed, illustrating that the estimated correlation of expenditures with income becomes larger as the level of aggregation becomes larger. A discussion section considers why the disparity between micro and macro results is so frequently unrecognized or even denied, despite the volume and prominence of the evidence.
Many decisions are subject to aggregate constraints and are best described by models that operate at two levels. For organ transplants, the total number available is determined by how many are harvested, while allocation is made based on need, tissue match, and sometimes geography. Nursing homes allocate spaces on the basis of need, referral, gender, or other variables while the total number served may be constrained by the number of beds. Similarly, the availability of physician services, at least in the short run, may be constrained by the number of physicians. Rank orderings also operate under global constraints (i.e., only one runner can come in first, and only three win medals, no matter how good the athletes are individually). Financing and expenditure decisions are frequently made at several distinct levels. For research grants, total funding is determined by some federal agency or foundation, and then allocations are made based on quality, appropriateness, and perhaps geography or previous relationships at the project level. For a health system with a financial budget constraint, total expenditures are determined at the macro (usually national) level and then allocated across individuals at the micro level.
In the simple two-level allocative model presented here, the total is determined at Level I by one set of variables, and at Level II the total is allocated to individuals by some other set of variables (which may or may not overlap: see Blalock 1964; Coleman 1990; Hannan 1991; King 1997). In the system of equations below, the aggregate total “X” is determined by some set of large-scale variables (L) at the macro level, and allocated across individuals so that each receives some fractional allocation “a” of the total which depends on individual attributes (si):
Capacity constraints are often bent or broken in the short run, as when a state runs a temporary budget deficit that must eventually be made up, or when a hospital operates at 110 percent of capacity until staff are burnt-out or supplemented. Indeed, defining large and small in temporal rather than spatial terms provides an alternative use of the two-level model: in such a perspective the Lg are long-run variables and the si are short run (Getzen 1990, 2000b; Ruhm 2001, 2004).
The explanation given above is sequential (total determined at level I, then allocated at level II), but the process is usually simultaneous. Also, while micro and macro are here represented as entirely separate, some interaction between levels is common. The willingness of the public to spend on medical care and the efficiency and equity of the system are best viewed as jointly determined, even though there may be some inertia and time lags involved. Similarly, changes in the method of allocating organ transplants perceived as more fair, or more life prolonging, or more favorable to participating hospitals, may have the effect of increasing the number of organs harvested. Yet flexibility does not mean that constraints are not real. Budgets do exist, and in the long run largely determine how much can be spent on health care, education, or research.
A binding constraint is established at the top in the two-level model presented above. The model could be extended to more levels, multiplying the number of budget constraints at various levels in cascading tiers. On the other hand, the constraint could be relaxed, so that the separation of micro-and macro-level variables becomes less absolute—a sort of soft budget model. At the extreme, one could craft the model as having some collective higher level variables, but no collective constraint at all. In that case, the group mean would go up or down as individual attributes change, with aggregate totals allowed to vary freely. This unconstrained version is much closer to the models developed in education research for estimation of student-, teacher-, and school-level effects often termed “hierarchical” or “multilevel linear models” (which are well described in Bryk and Raudenbush 1992; Goldstein 1995; Rice and Jones 1997; Carey 2000). In these models the influence of macro variables adds to (rather than limits) micro-level variation. Such freedom is not costless. The capacity constraint which facilitates the solution of allocative models (e.g., only one gold medal winner, total kidneys transplanted is limited by the number harvested, total inpatients on each day is limited by the number of beds, or total payments to providers is limited by premiums collected) becomes problematic in such a general linear model, and is apt to create aggregate specification errors. Whether one should use a hierarchical linear model or an allocative budgetary model depends on the characteristics of the process being studied (Rice and Jones 1997; Rosenkranz and Luft 1997).
Studies of health care costs at the national level show that expenditures rise rapidly with per capita income. However, studies of individual medical costs show only modest effects, or even declines associated with higher incomes (see Table 1 references). Reconciling this disparity in empirical results requires that analysis be carried out on multiple levels, with a conceptual framework connecting individual and group (ecological) behavior.
The reliance on insurance to pool funds and pay for health care is a root cause of the divergence between individual and group income effects. With insurance, it is the average income of the group, and the fraction of total income the group is collectively willing to devote to medical care, which determines the health care budget, not the income of the particular patient being treated (Getzen 2000a). This is clearly seen, for example, in the treatment of the elderly under Medicare. Collective political decisions about what services are to be covered, what payments are to be made to providers, the rules for review and co-payments, and so on, are much more important quantitatively in determining overall cost than individual patient decisions about whether or not to seek treatment, or which treatment or prescription to use. It is not that individual decisions are irrelevant; it is just that they are overwhelmed by the group decisions to define and fund the program as a whole. To a considerable extent, insurance converts private medical care into something like a “public good” for the group that is covered by the plan (Getzen 2004, p. 379–80).
In those types of medical care where insurance is less significant (plastic surgery, eyeglasses, dental care, mental health counseling), individual behavior is no longer dominated by collective group decision making, and personal incomes have a large effect upon personal expenditures (as evident in the “Special/Uninsured” section of Table 1). It is the individual, rather than the collective, budget constraint that matters in these cases. In the example of organ transplants discussed above, the relevant capacity constraint could occur at the level of the hospital, the region, or the nation, depending upon how transplant services were organized. Similarly, it is the way in which financing and insurance pools are organized that determines the boundaries of the “groups” which are most relevant in determining health expenditures.
Current income elasticity estimates of (largely insured) medical care at the individual micro level are almost always near zero or negative. Measured income elasticity at the national macro level is consistently positive and large, usually exceeding 1.0. When the unit of observation is intermediate between the micro and macro level—for example at the level of the hospital, health plan, county, or region, then the income elasticity is typically somewhere between 0 and 1.0, that is, between the micro and macro level parameter estimates (Parkin, McGuire, and Yule 1987; Blomqvist and Carter 1997).
Table 1 presents the results of a number of empirical studies of the effect of income on health care expenditure/utilization, arrayed by increasing level of population aggregation. Also presented in Table 1 are some individual level results from prior periods, other countries, or particular types of care where insurance coverage was less common and less complete. Although the disparity in the magnitude of income effects across levels of observation is readily apparent, what is less evident is the dominance of income effects at the macro level. When carefully specified, income accounts for more than 90 percent of the total variance in national health expenditures (NHE), and virtually all of the explainable variation (Newhouse 1977; Getzen 1990; Getzen and Poullier 1991). That is, once income effects are accounted for, the residual variance in NHE is small and shows no consistent association with other variables.1 The dominance of income effects at the national level establishes a very simple form of group budget constraint and makes the multilevel “allocative model” presented above particularly useful in the analysis of health care costs.
Many who are used to discussing the effects of technology, disease prevalence, hospital construction, population aging, public financing, or legislation may be surprised that such factors are not seen as significant independent variables in the determination of national health spending. Yet a review of empirical studies over several decades makes it apparent that none of these, or any of the other variables tested, has a consistently demonstrable effect of substantial magnitude. Perhaps most revealing is the examination of population aging. Aging has a profound effect on individual differences in medical spending, not on average per capita expenditures. Growth in expenditures on the aged is attributable primarily to increased intensity of service and secondarily to the connection between wealth and increased longevity, rather than to changes in morbidity or the number of elderly. After controlling for the effects of national per capita income, there is essentially zero correlation between changes in the percentage of the population older than 65 and the share of GDP spent on health care (Barer et al. 1989; Getzen 1992; Chernichovsky and Markowitz 2004). As one example, from 1980 to 1990 expenditures on acute hospital facility use by U.S. Medicare beneficiaries rose from $19,460 million to $47,842 million (146 percent) because of increased intensity (with per diem payments rising 190 percent) while the rate of discharges and days per discharge fell and the number of elderly beneficiaries rose by just 5.5 million (21 percent) and their average age by just 1/2 year (1 percent) (Health Care Financing Review 1998). An analysis of per capita Medicare physician resource use 1993–1998 found that virtually none of annualized 6 percent real growth could be attributed to the effects of aging, disease prevalence and distribution, or rapid technological change (Buntin et al. 2004). Aging alone (increases in the number of elderly beneficiaries and increases in their average age) can therefore account for only a small fraction of the increase in Medicare expenditures.
An illuminating exception to the dominance of national per capita income is provided by South Africa. In brief, medical care there during the 1980s and 1990s was split between a public sector, which served indigent and low-income persons, and a private sector serving the affluent (van Den Heever 1998). Both sectors were insured, but they did not pool or transfer funds to each other. Thus the structure of medical financing was that of two nations within a single country. The majority of health expenditures went to the 18 percent minority with private insurance spending rs. 3,400 per person. The public sector, with 82 percent of the population, spent only rs. 600 per person. Other developed nations often show some differences in the amount of care provided to high- and low-income groups, but the prevalence of comprehensive social insurance prevents disparities of such a large (>5:1) order of magnitude.
By refusing to pool funds, separate health care systems can be created within a single nation. Conversely, extensive pooling can homogenize a health care system so that regional variation disappears. Sweden appears to be such a case. Although county councils are the primary health care providers and financial units, extensive central government control and fiscal transfers are used to equalize per capita spending across areas. With variation in expenditures actively eliminated by such central government control, differences in patient characteristics, local supply, risk factors, or other variables cannot generate regional spending differences. Variation in age, injury, physician practice style, morbidity, mortality, etc., will affect the spending on each individual, but cannot change county-wide spending per capita unless the central administration funding formula explicitly allows for adjustment based on that factor.
The idea that the amount spent on health care is determined by the amount available to spend rather than the amount of disease is not particularly new, and may even border on the obvious. However, this obvious point has not been incorporated into most health policy discussions or used in structuring most of the models developed for empirical analysis in health services research. A significant barrier to understanding the role of aggregate budgetary constraints is what Kahneman and Tversky have termed the “representativeness” heuristic: a persistent and distorting tendency for observers to assume that any single instance (or small group) somehow represents the whole by being “similar in essential properties” and also “reflects the salient features of the process by which it is generated”—a correspondence between the individual and the aggregate that clearly fails to hold for health care costs (Kahneman, Slovic, and Tversky 1982, p. 33). Kahneman and Tversky have shown that the biases and empirical blindnesses because of use of a representativeness heuristic hold not only for the lay population, but also for sophisticated researchers, including those trained in statistics.
The distorting effects of a representativeness heuristic on perceptions of aggregate medical costs may be strengthened by several additional factors, of which five are mentioned here. First, no patient is average. The connection between what is spent on health care and their particular illness is so overwhelmingly evident that the converse (that total spending would not change whether or not I have a heart attack or AIDS or colon cancer) seems inconceivable. Second, physicians and other clinicians deal one-on-one with sick patients, and sicker patients cost more (Wells 2002). Third, this personal and clinical focus on the individual resonates with the methodological stance of most economists, which emphasizes the role of prices and individual utility maximization rather than the relatively obvious accounting identity forcing total spending to be limited by total income (Kirman 1992). Fourth, aggregation greatly reduces the number of observations and thus may make macro analysis of less interest to some researchers. Fifth, some embarrassing early failures in the analysis of aggregate data because of ecological fallacies may have conditioned some investigators to resist a macro perspective, and be less willing to consider building models from the top down as well as bottom up. Finally, a macro perspective tends to contradict those who wish to use health economics, decision trees, or outcomes research as a means for showing that more spending on a particular drug or surgical procedure is actually cost-reducing because “it will save money overall.” Legions of proposals for reducing costs by spending more (home health, intermediate units for ventilator patients, laparascopic cholesystectomy, laser surgery, etc.) using projected results from micro studies have proven disappointing once methodologically sound macro studies were conducted. Still, the idea that some cost-saving innovation will come along to save the health care system from itself, and thus can avoid the hard political choices which go along with setting budgetary limits, is so appealing that it is quite hard to give up.
Among the most successful challenges to complacency in health policy has been “small area analysis” (Wennberg and Gittelsohn 1982). Although at first appearing to have but a single level of observation, the methodology used to define “small areas” implies higher and lower (bigger and smaller) units as well. Boundaries are set so that each “area” is “large enough” for most individual differences in health status to be averaged out making utilization rates stable and meaningful, and yet “small enough” for most patients to have the same/overlapping hospital/physician providers and to make multiple observations within a single legal and cultural environment—e.g., counties within the state of Vermont. These particular “small area” boundaries highlight differences between groups of patients treated by geographic clusters of physicians linked through some degree of communication and similarities in training, now identified as “practice variation.” Making the boundaries larger or smaller changes the focus of the analysis, and the implied level of policy intervention. A narrow focus using individual person observations calls attention to individual attributes such a health status, and emphasizes reductions in illness or increases in the efficiency of treatment. Defining the boundaries by insurance group rather than geography accentuates reimbursement differences and financial incentives. Enlarging the unit to coincide with state boundaries highlights regulatory differences. Observations encompassing the entire nation draw attention to macroeconomic forces, federal budget constraints, and limits on the supply of physicians and hospitals. The “cause” of cost variation thus depends very much upon how one goes about looking for it and therefore so does the implied policy intervention (Table 2).
The projection of future health care costs is among the most crucial tasks of fiscal policy analysis, particularly for the Medicare program. Yet if health care costs at one level are modeled and estimated using variables and observations from a vastly different level, then such estimates or projections are likely to be vastly wrong. Adding more variables and more observations will not usually improve the estimates, but doing so can overcome critical resistance by providing an impression of rigor and precision. In attempting to project NHE, it was soon recognized that the standard “demographic” projection of spending by age-sex categories (a) did not explain the rates of growth in expenditures in the past and (b) would seriously underestimate future budgetary requirements. Attempts to ameliorate this failure by adding multiple disease categories, patient types, provider settings, insurance plans, regional fixed effects, etc., or shifting to micro-simulation models with ever-increasing specificity and detail, met with little success. Over time, the Office of the Actuary, CMS, learned to rely more and more on a macro perspective that treated per capita GDP as the main driver of spending in order to create the official 10-year projections of NHE (Peden and Freeland 1998; Smith et al. 1998; Getzen 2000b; CMS 2005). It has become clear that errors accumulate when the aggregate effects of income on budget constraints are omitted, or when the individual effects of morbidity and mortality are mechanically extrapolated.
The lessons learned have not yet been widely applied. Most projections of “health care costs” do not carefully address the disparity between micro and macro estimates, and may not even explicitly acknowledge that such a divergence exists (Feldstein 1971; Lee and Miller 2002; Shesamani and Gray 2004; Stearns and Norton 2004; Goldman et al. 2005; Thorpe 2005). For example, the Rand Health Insurance experiment, considered a “gold standard” for testing the effects of price elasticity and insurance coverage on individual expenditures, is often misused to create estimates of aggregate regional or national spending, a purpose for which it was not designed, for which it is ill-suited (Finkelstein 2005). Analysts will assert that HIV infection, myocardial infarction, or aging will “cause” NHE to rise, while fully aware that higher rates of HIV incidence in Africa, of cardiac disease in Scotland, or of senior citizens in Japan, do not obviously “cause” NHE to be higher. Moreover, it is commonly asserted that reductions in disease morbidity and mortality will reduce U.S. health care costs in the future despite 50 years of experience to the contrary.
Many analysts have pondered why a cost control policy that seems so successful in one instance, or among individuals, or among physicians or hospitals alone, fails to reduce total costs. It is primarily because the consequences of income for spending are established only at the level where the budget constraint is fixed. For some types of health care, this is the individual household; for others, the hospital, region, or insurance plan; but for much of medicine it is the national budget constraint that is most relevant. Analysis of the determinants of health care costs should use units of observation corresponding to the units at which such decisions are made. To the extent that expenditure decisions are made at multiple levels, then a multilevel analysis is required.
Recognition that budget constraints are binding leads to an understanding of why so many individual variables and organizational decisions have effects that are allocative, distributing the total across patients by need and other characteristics, rather than additive, summing upwards to a total that can rise or fall as individual conditions change. It is a common empirical finding that interventions which are successful in reducing costs locally at the micro level somehow fail to reduce total costs overall (macro level), classically expressed by Salkever and Bice in the metaphor of a “regulatory balloon” where pushing in at one point causes a bulge somewhere else, with no real change in total volume.
The point made here with respect to health care costs is applicable to a broad range of policy evaluations. Policies that operate through individual incentives require individual-level measures; policies operating on localities need local area measures (census blocks, counties, regions); and national policies should be measured at the national level. The appropriate unit of analysis is determined not by the availability of data or the desire to increase N, but by matching units of observation to the units of action. This matching principle holds in the temporal dimension as well. Short-term policy effects cannot be captured by measures taken only once per decade in a decennial census, nor can long-term policies be evaluated using hourly data measures. It is the span of action, not the boundaries of data collection, which correctly defines the units of observation.
Parts of this work were supported by a visiting research fellowship at the Center for Health and Wellbeing, Princeton University and by a research study leave from Temple University. Helpful comments were provided by Ken Buckingham, Angus Deaton, Mark Freeland, Daniel Kahnemann, Shiela Smith, Dave Whynes, and seminar participants at Baltimore, Nottingham, and Princeton. Residuals errors and ambiguities are the responsibility of the author.
1Although the causation of expenditure growth is probably complex and rests upon a number of factors, so far it has not been possible to devise consistent measures of other variables that reliably demonstrate significant effects in rigorous and repeated empirical tests. For example, it is often asserted that the American public is more given to a culture of excess and enamored of flashy new technology than Europeans. Yet attempts to define and measure that “factor” lead to the unsatisfactory “U.S. fixed affect dummy variable” that in essence supports an ad hoc claim that the U.S. spends more because they are spendthrifts. Related political culture measures of “egalitarianism,”“respect for authority,”“social solidarity,”“publicness,”“centralization,”“willingness to wait,” and so on have all been found to be statistically significant (or not) in some studies but not others, and all appear to be imprecise and ad hoc either in definition or application. “Technology” is frequently referred to as a causative factor, yet most econometric studies simply take the residual unexplained growth from a time series regression and term that a proxy measure for technological progress, and thus lack objective quantification that can be independently verified (Peden and Freeland 1998; Buntin et al. 2004). Although “the residual” is clearly unsatisfactory as a measure of a major cause, the readily available alternatives (number of patents, lists of major medical breakthroughs, R&D spending) that can be measured in a fairly consistent and uncontroversial manner do not appear to provide significant explanatory power.