|Home | About | Journals | Submit | Contact Us | Français|
Objective To determine the bias associated with frequency of visits by physicians in adjusting for illness, using diagnoses recorded in administrative databases.
Setting Claims data from the US Medicare program for services provided in 2007 among 306 US hospital referral regions.
Design Cross sectional analysis.
Participants 20% sample of fee for service Medicare beneficiaries residing in the United States in 2007 (n=5153877).
Main outcome measures The effect of illness adjustment on regional mortality and spending rates using standard and visit corrected illness methods for adjustment. The standard method adjusts using comorbidity measures based on diagnoses listed in administrative databases; the modified method corrects these measures for the frequency of visits by physicians. Three conventions for measuring comorbidity are used: the Charlson comorbidity index, Iezzoni chronic conditions, and hierarchical condition categories risk scores.
Results The visit corrected Charlson comorbidity index explained more of the variation in age, sex, and race mortality across the 306 hospital referral regions than did the standard index (R2=0.21 v 0.11, P<0.001) and, compared with sex and race adjusted mortality, reduced regional variation, whereas adjustment using the standard Charlson comorbidity index increased it. Although visit corrected and age, sex, and race adjusted mortality rates were similar in hospital referral regions with the highest and lowest fifths of visits, adjustment using the standard index resulted in a rate that was 18% lower in the highest fifth (46.4 v 56.3 deaths per 1000, P<0.001). Age, sex, and race adjusted spending as well as visit corrected spending was more than 30% greater in the highest fifth of visits than in the lowest fifth, but only 12% greater after adjustment using the standard index. Similar results were obtained using the Iezzoni and the hierarchical condition categories conventions for measuring comorbidity.
Conclusion The rates of visits by physicians introduce substantial bias when regional mortality and spending rates are adjusted for illness using comorbidity measures based on the observed number of diagnoses recorded in Medicare’s administrative database. Adjusting without correction for regional variation in visit rates tends to make regions with high rates of visits seem to have lower mortality and lower costs, and vice versa. Visit corrected comorbidity measures better explain variation in age, sex, and race mortality than observed measures, and reduce observational intensity bias.
Good methods of risk adjustment are essential to make sense of observed regional variations in utilization, expenditure, and mortality rates between healthcare regions. Unwarranted variation in the use of resources and in outcomes can thus be understood more precisely and the information targeted to improve quality and respond to fiscal pressures. Studies over three decades have shown that the key driver of regional variations in per capita spending for hospitals across areas are variations in admission rates (not in costs per case), and these exceed what would be expected given known differences in morbidity.1 2 3 4 5 These studies raise the question of the extent to which variation in resource use and healthcare outcomes, after the conventional adjustments in the United States for age, sex, and race, are justified because of differences in risk. Although it is well recognized that data on utilization of healthcare cannot be used directly as a sound proxy for risk, as these data also reflect differences in access to supply of healthcare resources,6 7 8 9 methods of risk adjustment have been developed based on diagnoses recorded in medical records and insurance claims. Such methods are routinely used in calculating standardized mortality rates in the public reporting of hospital mortality10 11 12 13 14 15 and in observational studies of the relation between healthcare use and outcomes.16 17 18 19 They are also used to adjust for comorbidity in making payments to physicians20 and to third party payers: insurance companies in the Netherlands,21 the US Medicare’s Advantage program,22 and the formula used to determine target allocations of resources to clinical commissioning groups in England.23
The validity of methods that use data on diagnoses to estimate risk depends on the assumption that recorded diagnoses are independent of supply, and hence closely reflect the true underlying burden of illness; or that adequate controls have been developed to control for the effects of supply. One study9 emphasized the importance of adequate controls in designing methods of risk rating to counter incentives for “cream skimming.” The assumption that diagnostic data do offer objective measures of need is known to be suspect as differences in coding practices have already been shown to introduce an “up-coding” bias in comparing mortality across hospitals.24 But these studies did not consider the effects of differences in intensity of patient observation. Recent studies that have done so have demonstrated that the intensity of patient observation, measured by the frequency of visits by physicians and the number of laboratory tests and imaging exams is associated with the frequency of diagnosis recorded in Medicare claims data, independent of underlying illness, as measured by mortality adjusted for age, sex, and race. In a natural experiment, Medicare beneficiaries who migrated to regions with a higher intensity of care (measured by spending on end of life care) experienced more physician visits, diagnostic tests, and imaging exams and accrued more comorbid diagnoses as measured by hierarchical condition categories risk score than did those who migrated to regions with a lower intensity of care, even though the mortality rates over a three year follow-up period after migration were similar.25 A second study showed that as the regional intensity of patient observation increased (measured by the average numbers of physician visits, different physicians seen, imaging exams, and laboratory tests), the mean number of chronic illnesses diagnosed using the Iezzoni convention for measuring comorbidity also increased, even though the age, sex, and race mortality rates among regions were similar.26
We examined the magnitude of the bias associated with the intensity of patient observation using three conventions for measuring comorbidity: the Charlson comorbidity index,27 the Iezzoni chronic condition count,28 and the hierarchical condition categories risk score.22 We compared two methods for risk adjustment: the standard method, which adjusts for comorbidity based on recorded diagnoses, and a modified method that reduces the impact of supply by correcting comorbidity measures to remove the component associated with intensity of patient observation (visits by physicians). To make this correction we used the rate of visits by a physician during the last six months of a patient’s life, a measure of observational intensity that is unrelated to disease burden as measured by age, sex, and race adjusted mortality. We tested the validity of the two methods by examining how well they explain and reduce regional variation in age, sex, and race adjusted mortality and their effects on illness adjusted mortality and price adjusted spending rates in regions with high and low rates of physician visits. We discuss the implications of our findings for the ways in which these strategies are used for risk adjustment.
Figure 11 illustrates our conceptual framework and study design. We consider the observed number of different diagnoses to be a product of two factors: the actual burden of disease (the true or intrinsic amount of disease) and the intensity of observation (which conditions the likelihood of having a diagnosis for a given level of disease burden). Conceptually, the intensity of observation represents the combined effect of a spectrum of factors: the resolution of diagnostic exams (including the physical exam, imaging, and laboratory exams), the thresholds used to label exam results as abnormal, the frequency with which exams are done, and the number of observers who have an opportunity to make a diagnosis.
The intensity of observation is related to the frequency of contact with a physician: the number of visits by physicians, specialist referrals, and admissions to hospital. For this analysis we made use of our prior work29 and focused on the rate of physician visits in the last six months of life as a proxy for intensity of observation. We then used this proxy to produce a visit corrected measure of illness—essentially subtracting out the intensity of observation component from the observed Charlson comorbidity index, the Iezzoni chronic illness count, and the hierarchical condition categories risk score.
To test whether this new method is a better approximation of actual disease burden, we compared observed comorbidity measures with visit corrected comorbidity measures for their ability to explain and reduce regional variation in the least ambiguous measure of disease burden: age, sex, and race adjusted mortality rates among the sampled Medicare population. We then compared the effect of these two methods of risk adjustment on mortality and spending in individual hospital referral regions and in hospital referral regions aggregated into fifths according to the frequency of physician visits.
We obtained from the Centers of Medicare and Medicaid Services the insurance claims for a 20% sample of Medicare beneficiaries. We restricted the analysis to those beneficiaries who were either fully enrolled in part A and part B of Medicare throughout 2007 and were 65-99 years old on 31 December 2007 or fully enrolled beginning 1 January 2007 until their death that year and were 65-99 years old at their time of death. Because the claims data for beneficiaries enrolled in risk contract health maintenance organizations were incomplete, they were excluded. Analysis was also restricted to beneficiaries who were resident in one of the 306 hospital referral regions as defined in the Dartmouth Atlas of Health Care.29 (The hospital referral regions were empirically defined based on patient origin studies to define the geographic region served by tertiary hospitals.) The sampled Medicare population in hospital referral regions ranged from 2549 to 91020 beneficiaries. The final sample, covering all 306 hospital referral regions, totaled 5153877 beneficiaries.
Our proxy measures for the number of different diagnoses were the illness measures used by each convention. The Charlson comorbidity index27 comprises 19 acute and chronic conditions with assigned risk scores calibrated to predict one year mortality. The Iezzoni chronic condition28 is a count of up to 12 chronic conditions selected on ability to predict one year mortality. The hierarchical condition categories risk score22 is a composite measure based on an individual’s age, sex, and Medicaid and disability status. It is based on the coding algorithms used by Medicare to adjust payments for Medicare Advantage plans30; it incorporates an array of diagnoses grouped according to the hierarchical condition categories classification system—70 diagnostic groups calibrated to predict spending and also used to adjust mortality rates for illness in public reporting of hospital mortality.
For a beneficiary to be counted as having a condition, the diagnosis had to be coded either on at least one hospital discharge abstract after an inpatient stay or on at least two claims involving physician contact that were at least seven days apart. This was done to reduce the likelihood that a diagnosis was recorded for a condition that was to be “ruled out” by further study.
The proxy for intensity of observation was the frequency of visits by physicians. To ensure that no direct relation could exist between our proxy for intensity of observation and comorbidity score or mortality we measured physician visits in the prior year (2006). We have adapted this measure for the Dartmouth Atlas of Health Care29 in our hospital referral regions, by reducing it to the average of the total number of visits for evaluation and management (both inpatient and outpatient) made by physicians per beneficiary in the last six months of life, a subset of patient records for which the severity of illness is unlikely to vary importantly across regions. (The regional variation in visit rates is not explained by differences in illness related factors such as age, sex, race, poverty, or type of medical condition.)31
We calculated a visit corrected measure of illness using simple linear regression in which the dependent variable was the observed illness measure (as calculated from the claims data) and the independent variable was the measure of frequency of visits by physicians. The dependent variable was an individual person level comorbidity measure and the independent was the physician visit rate at hospital referral region level. For example, for the Charlson comorbidity index, the residual from this regression—the difference between observed index and predicted index based on the frequency of physician visits—is the visit corrected index. In other words, the residual represents the component of illness not explained by the frequency of physician visits—that is, our proxy for the intensity of observation. Comparable regression models were used to generate person level visit corrected indices for both the hierarchical condition categories risk score and the Iezzoni chronic condition count.
We then compared observed comorbidity with visit corrected comorbidity for each of the three measurement conventions for their ability to explain and reduce the regional variation in age, sex, and race adjusted mortality for the sampled populations. Regional adjusted mortality rates were calculated using a linear regression model at the patient level (SAS GENMOD procedure) with 20 age, sex, and race indicator variables, along with the 306 hospital referral regions included as classification variables, and no intercept. We used the resulting region level coefficients to construct age, sex, and race adjusted mortality rates. Regional “standard method” illness adjusted rates were calculated by adding the patient level illness measure (for example, Charlson score) to the age, sex, and race regression model, using the region level coefficients to construct the rates. We calculated the regional “modified method” illness adjusted rates by adding the patient level visit corrected comorbidity score to the age, sex, and race regression model, again using the region level coefficients to construct the rates.
The percentage of the variation in age, sex, and race adjusted mortality explained by observed compared with visit adjusted comorbidity was analyzed at the regional level using the coefficient of determination (R2). Results are reported for weighted as well as unweighted regressions for hospital referral region sample size. We measured the effect on regional variation among the 306 hospital referral regions by using standard statistics: ratio of highest to lowest hospital referral region; ratio of 75th to 25th centile; and the coefficient of variation (standard deviation divided by the mean, expressed as a percentage).
Finally, we compared the effect of risk adjustment using observed and visit corrected comorbidity measures on mortality and price adjusted Medicare spending.32 Adjustment of spending according to age, sex, and race was accomplished using the same methods as for mortality. The resulting hospital referral region variable estimates represent direct adjusted hospital referral region level mortality and spending rates. Similarly, we also computed adjusted rates for hospital referral regions aggregated into fifths based on the rates of visits by physicians. A series of three regression models was run similar to the regional models but incorporated the fifths as classification variable instead of hospital referral regions.
Among the 306 hospital referral regions, the mean number of physician visits per decedent during the last six months of life varied across the regions from 10 to 59 and was not correlated with age, sex, and race adjusted mortality (R2=0.000, P=0.88). The rate of visits, however, was strongly correlated with the number of diagnoses observed in the claims data. The correlation with the mean hierarchical condition categories score was R2=0.53 (P<0.001) and with the mean number of Iezzoni chronic conditions and Charlson comorbidity index was 0.46 (P<0.001) each. This combination of findings suggests that the frequency of physician visits serves as a good proxy for the intensity of observation: associated with the frequency of diagnosis but not confounded by the actual burden of disease.
The ability of observed and visit corrected comorbidity measures to explain the variation in age, sex, and race adjusted mortality was shown using unweighted regressions (fig 22).). For the hierarchical condition categories, the Iezzoni and the Charlson conventions, comorbidity measures based on the diagnoses in the claims data explained 10-12% of the variation in age, sex, and race adjusted mortality among the 306 hospital referral regions; however, when the observed comorbidity measures were corrected for visits they explained 21-24% of the variation, or about twice as much. Regressions weighted for the sample size of the hospital referral regions explained less: 2-5% of the variation in age, sex, and race adjusted mortality using observed comorbidity and 17-21% for visit corrected comorbidity.
Compared with age, sex, and race adjusted mortality, adjustment using observed comorbidity resulted in an increase in regional variation in mortality rates, whereas adjustments using visit corrected comorbidity resulted in a decrease in variation for each of the three conventions (fig 33).
Table 11 illustrates the effect of risk adjustment using the Charlson comorbidity index on apparent mortality among hospital referral regions aggregated into fifths based on rates of visits by physicians. The rate of visits in the highest fifth was 2.4 times that of the lowest fifth. However, the age, sex, and race adjusted mortality rates in these two fifths were similar (51.0 v 50.0 per 1000). Adjustment for illness using the observed Charlson comorbidity index had a strong impact on illness adjusted mortality: it became highest in the lowest fifth for visits and decreased in a stepwise fashion across fifths of increasing visit rates. Compared with adjustment by age, sex, and race alone, mortality increased by 10.3% in the lowest fifth of visits and decreased by 7.1% in the highest fifth. Thus, after adjustment using the observed Charlson comorbidity index, the illness adjusted mortality rate in the highest fifth of visits was 17.6% lower than in the lowest fifth, despite a nearly identical age, sex, and race adjusted mortality rate.
The effect on mortality of adjustment using the visit corrected Charlson comorbidity index was different. Compared with age, sex, and race adjusted mortality rates, the mortality rates among the fifths showed little change; the stepwise inverse association with rate of visits seen with adjustment using the observed index disappeared; and there was little difference in mortality between the highest and lowest fifths.
Table 1 also provides information on the effect of adjustment for illness in six hospital referral regions with large urban populations; three with high and three with low mean rates of visits by physicians. Compared with adjustment for age, sex, and race, adjustment using the observed Charlson comorbidity index decreased mortality by 18.3% in Miami, 3.2% in Los Angeles, and 13.3% in Manhattan; mortality rates in Minneapolis, Seattle, and Salt Lake City increased by 23.5%, 15.1%, and 20.6%, respectively. Adjustment using visit corrected Charlson comorbidity index resulted in less change in each region except for Los Angeles.
Similar findings were seen for the fifths and the selected hospital referral regions, when adjustments were made using Iezzoni chronic conditions and hierarchical condition categories risk scores (table 22)) For example, using the standard method, the illness adjusted morality rate in the highest fifth of visits was 19.0% lower than in the lowest fifth using Iezzoni chronic conditions and 21.8% lower using hierarchical condition categories risk scores.
Age, sex, and race adjusted and price adjusted Medicare spending in our 20% sample varied among regions, from $5323 (£3376; €3936) to $15706 per beneficiary and, in contrast with age, sex, and race mortality, was highly correlated with supply as measured by physicians visit rates for end of life care (R2=0.46). Table 33 illustrates the effect of adjustment for illness on Medicare spending using observed hierarchical condition categories risk scores (the standard adjustment used by the Medicare program to pay insurance companies) compared with visit corrected risk scores. In the fifth with the highest rate of visits, age, sex, and race-price adjusted spending was 32.4% greater than in the lowest fifth. Adjusting for observed hierarchical condition categories risk scores increased spending by 12.8% in the lowest fifth of visits and decreased it by 8.3% in the highest fifth of visits; as a result, illness adjusted spending in the highest fifth was only 7.7% greater than in the lowest fifth. Risk adjustment using visit corrected hierarchical condition categories risk scores resulted in a much smaller change.
Table 3 also illustrates the impact of adjustment on the selected hospital referral regions. Based on age, sex, and race-price adjustment alone, the mean Medicare spending per beneficiary in Miami, Los Angeles, and Manhattan was substantially greater than in Minneapolis, Seattle, and Salt Lake City. After adjustment for illness using observed hierarchical condition categories risk scores, spending in Manhattan was less than in Minneapolis, Seattle, and Salt Lake City; and spending in Minneapolis and Salt Lake City was more than in Los Angeles. Adjustment using visit corrected hierarchical condition categories score resulted in important less change in spending in each region.
Similar findings were obtained for fifths and the selected hospital referral regions using the Charlson comorbidity index and Iezzoni chronic conditions (table 44).
Data for each hospital referral region for visit rates and effects of adjustment on age, sex, and race mortality and price adjusted spending can be downloaded from the Dartmouth atlas website (www.dartmouthatlas.org/).
Methods of risk adjustment lack face validity if they fail to reduce variation and result in implausible changes in age, sex, and race mortality rates. The standard method of illness adjustment based on observed comorbidity measures obtained from Medicare’s administrative databases explained little of the variation in underlying disease burden as measured by age, sex, and race adjusted mortality and increased rather than decreased regional variation. Moreover, it resulted in implausible changes in mortality rates in regions with high and low rates of visits by physicians. Although age, sex, and race adjusted mortality rates were virtually the same in the highest and lowest fifths of visits, the standard method resulted in illness adjusted mortality that was 18% lower in the highest fifth of visits using the Charlson comorbidity index, 19% lower using Iezzoni chronic conditions, and 22% lower using hierarchical condition categories risk scores.
The visit corrected method for illness adjustment proved to be a better predictor of the underlying burden of illness. For each convention for defining comorbidity, the visit corrected method explained much more of the variation in age, sex, and race adjusted mortality; it reduced rather than increased regional variation compared with the standard method of adjustment and did not result in unlikely changes in illness adjusted mortality rates in regions with high and low rates of visits by physicians.
Our analysis has several limitations. Firstly, it was restricted to fee for service Medicare and thus captures only part of the experience of the insured population. Since our data did not include Medicare Advantage (Medicare’s capitated insurance plan), we were unable to examine the association between financial incentives involving capitation and the number of diagnoses nor evaluate the impact of Medicare’s policy to use hierarchical condition categories risk scores to reduce “cream skimming” by health insurance companies serving patients enrolled in Medicare Advantage. Secondly, our analysis cannot distinguish between the intensity of observation and intentional up-coding. Coding practices could vary among regions and are not addressed by the risk adjustment methods discussed here. However, to the extent that such behavior was correlated with frequency of visits by physicians, it would be controlled for by our adjustment method. Thirdly, although we use physician visits during the last six months of life as a proxy for intensity of patient observation, the results are not sensitive to this measure: Among the 306 hospital referral regions, the rates of visits in 2006 for those alive at the end of the year were highly correlated with visits for those who died and, like the end of life visit rate, were not correlated with age, sex, and race adjusted mortality rates. Using visit rates in 2006 for those alive at the end of 2006 to adjust hierarchical condition categories scores gave results similar to those reported here. Fourthly, the method for correcting for observational intensity using visit rates as a proxy depends on the availability of this information in administrative databases, limiting the applicability of the method elsewhere. Fifthly, our method for counting diagnoses excludes likely rule-out diagnoses and is thus more conservative than the method used by the Centers for Medicare & Medicaid Services in adjusting Medicare Advantage payments, which counts such diagnoses. In a sensitivity analysis, we found that including rule-out diagnoses in calculating risk scores resulted in even greater changes from age, sex, and race adjusted mortality and spending, particularly in regions at the extreme high and low end of the spectrum of variation in rates of physician visits. Finally, this study did not evaluate risk adjustment methods that, in addition to comorbidity measures based on administrative databases, incorporated measures of supply23 or prior healthcare use.33 Further research would be needed to evaluate the ability of such methods to reduce the observational intensity bias observed when risk adjustment is based on counts of diagnoses recorded in claims data.
Our study emphasizes the importance of recognizing the differences between data generated in different ways, as Taleb argued.34 Those that are produced socially are problematic to interpret because they have meaning to those that generate them and tend to have highly skewed distributions. Those that are produced naturally are straightforward to interpret and tend to follow Gaussian distributions. It is thus a serious categorical error to treat the former data like the latter. In healthcare this distinction has also to be made between social data, such as per capita supply of medical expenditure, admission rates, physician visits, and medical diagnoses; and biologic data, such as height and mortality, which tend to follow a Gaussian distribution.
Adjustment methods that depend on the number of different diagnoses recorded in medical records and administrative databases are subject to observational bias rooted in the varying capacity of the local healthcare system: the more physicians involved in care, the more visits, the more diagnostic tests, the more imaging exams, and the more diagnoses and comorbidities uncovered and recorded in the claims data. Adjustment methodologies that assume that a diagnosis made in Miami, Manhattan, or Los Angeles contains the same information as one made in Minneapolis, Seattle, or Salt Lake City are guilty of what one author labeled the constant risk fallacy: the assumption that the underlying relations between case mix variables and outcomes are constant across populations or over time.35 In their study of bias in the Dr Foster Unit’s standardized mortality ratios, Mohammed et al36 illustrate this phenomenon by documenting inconsistency in the relation between the Charlson comorbidity index and adjusted mortality rates across four UK hospitals. For the same methodological reasons this work identifies the same problem on an international scale and offers a plausible reason that can be corrected through a less biased case mix adjustment.
In short, the more one looks, the more one finds. This means that using data on diagnoses to adjust for risk produces potential problems in three areas:
Bias in research and evaluation—Observational studies that use case mix adjustments based on claims data are suspect. For example, if, as for Medicare, spending or utilization per capita and rates of diagnosis are highly correlated, then research that seeks to evaluate the relation between use of medical care and mortality while controlling for illness using recorded diagnoses will produce biased estimates.30 31 32
Biased performance measures—Adjusting performance measures using several different diagnoses makes providers who frequently make diagnoses look better than those who manage their patients more conservatively. Thus, as in the Dr Foster Unit example, adjusting quality and efficiency measures for observed number of diagnoses appears to distort rather than improve the reliability of performance measures. The use of these adjustment methods in public reporting of hospital mortality is widespread, including England,10 the United States,11 12 Canada,13 the Netherlands,14 and Sweden.15
Biased payment to third party payers—Adjusting payments to third party payers on the basis of the frequency of diagnoses recorded in administrative databases may result in higher per capita payments in regions with more physicians, hospital beds, and visits per capita, independent of underlying disease burden. This problem has long been recognized in designing methods of risk adjustment to counter the tendency to avoid sicker patients in order to improve apparent outcomes and profitability, the so called “cream skimming” 9 Hence, methods of risk adjustment that use data of diagnoses try to control for the effects of supply by including socioeconomic data, and data on supply and utilization by individuals. The question raised by the evidence reported here is to what extent is a given method susceptible to observational intensity bias?
We have shown that a method of risk adjustment that used data on diagnoses and controlled for the effects of supply, by using data on the frequency of visits by physicians in the year prior to a patient’s death, was more efficient than the standard method; but that still accounted for less than 25% of geographic variation in age, sex, and race adjusted mortality among fee for service Medicare beneficiaries. Thus, our study points to the importance of developing risk adjustment methods that better explain variation in age, sex, and race mortality rates and suggests that these will be found by using data that are clearly independent of the effects of supply.
We thank Jon Deeks, Adam Steventon, and Wynand van de Ven for comments on earlier drafts of this manuscript.
Contributors: All authors jointly wrote the article and are guarantors.
Funding: This study was partially supported by the National Institute on Aging (grant PO1-AG19783) and the Robert Wood Johnson Foundation. The funders had no role in the design and conduct of the study; the collection, analysis, and interpretation of the data; or the preparation, review, or approval of the manuscript.
Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declare: support from the organisation described below for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years, and no other relationships or activities that could appear to have influenced the submitted work.
Ethical approval: Not required.
Data sharing: No additional data available.
Cite this as: BMJ 2013;346:f549