|Home | About | Journals | Submit | Contact Us | Français|
Current methods of risk adjustment rely on diagnoses recorded in clinical and administrative records. Differences among providers in diagnostic practices could lead to bias.
We used Medicare claims data from 1999 through 2006 to measure trends in diagnostic practices for Medicare beneficiaries. Regions were grouped into five quintiles according to the intensity of hospital and physician services that beneficiaries in the region received. We compared trends with respect to diagnoses, laboratory testing, imaging, and the assignment of Hierarchical Condition Categories (HCCs) among beneficiaries who moved to regions with a higher or lower intensity of practice.
Beneficiaries within each quintile who moved during the study period to regions with a higher or lower intensity of practice had similar numbers of diagnoses and similar HCC risk scores (as derived from HCC coding algorithms) before their move. The number of diagnoses and the HCC measures increased as the cohort aged, but they increased to a greater extent among beneficiaries who moved to regions with a higher intensity of practice than among those who moved to regions with the same or lower intensity of practice. For example, among beneficiaries who lived initially in regions in the lowest quintile, there was a greater increase in the average number of diagnoses among those who moved to regions in a higher quintile than among those who moved to regions within the lowest quintile (increase of 100.8%; 95% confidence interval [CI], 89.6 to 112.1; vs. increase of 61.7%; 95% CI, 55.8 to 67.4). Moving to each higher quintile of intensity was associated with an additional 5.9% increase (95% CI, 5.2 to 6.7) in HCC scores, and results were similar with respect to laboratory testing and imaging.
Substantial differences in diagnostic practices that are unlikely to be related to patient characteristics are observed across U.S. regions. The use of clinical or claims-based diagnoses in risk adjustment may introduce important biases in comparative-effectiveness studies, public reporting, and payment reforms.
Risk adjustment is an essential Element of comparative-effectiveness studies, measurements of health care performance, and payment programs and is destined to become even more important as health care reform proceeds. Observational studies comparing the outcomes of various approaches to treatment1 or the performance of specific providers often adjust for patients’ preexisting diagnoses.2 The Medicare payment systems for institutional providers and health plans include payment adjustments that take into account the beneficiaries’ health or functional status.3,4 As payers move toward more bundled and value-based payment systems, incentives to avoid providing care for patients who are difficult to treat or patients for whom the cost of treatment is high will only increase.5 Inadequate risk adjustment could thus lead to flawed inferences, the “dumping” of high-risk patients, and distortions in insurance markets.
Risk adjustment is only as good as the information on which it is based. Current risk-adjustment methods depend on the diagnoses that are recorded by physicians in medical records or registries or are coded by medical-records personnel and billing staff in hospital discharge abstracts and physician claims. Concern about the accuracy of the underlying data for risk adjustment has focused largely on the problem of upcoding — that is, recording conditions on submitted claims data in such a way that risk scores, and thus payments, are higher.6 Differences in diagnostic practice have received much less attention. Studies have highlighted the ways in which the interpretation of pathological and radiologic examinations varies among physicians and the ways in which these differences affect the proportion of test results that are identified as abnormal.7–10 There is also variation across practices11 and regions,12 among physicians treating patients with similar conditions, in the propensity to order diagnostic tests or refer patients to subspecialists. If physicians have substantial and systematic differences in their diagnostic practices that are unrelated to the underlying health of their patients but are related to institutional or regional practice patterns, biases in risk adjustment will result.
We conducted a study to determine the magnitude of the differences in diagnostic practices across U.S. regions, using changes in Medicare beneficiaries’ place of residence as a natural experiment.
Previous research has documented substantial regional differences in the intensity of health care services provided to Medicare beneficiaries — differences that are independent of the beneficiaries’ health or socioeconomic status.13 Because beneficiaries are unlikely to be aware of the relative intensity of practice in their current region of residence or that of the region to which they move, we hypothesized that the average health status of beneficiaries who move to regions with a higher intensity of practice is similar to the average health status of beneficiaries from the same region who move to lower-intensity regions. We tested this hypothesis by comparing beneficiaries with respect to the number of diagnoses they had received and their use of health services before their move. We then followed the beneficiaries for 3 years after their move, stratifying them according to the intensity of practice in the region to which they moved. We hypothesized that those moving to higher-intensity regions would undergo more diagnostic tests and imaging services, would receive more diagnoses, and would thus have higher risk scores over time than those moving to lower-intensity regions.
We used a previously derived Medicare spending measure, the End-of-Life Expenditure Index, to define the local intensity of practice. This measure is calculated as the average spending (according to standardized national prices) on hospital and physician services provided to Medicare enrollees 65 years of age or older who were in their last 6 months of life, adjusted for age, sex, and race. The End-of-Life Expenditure Index reflects the component of local Medicare spending that is most closely associated with physician practice, rather than with local differences in the prevalence and severity of illnesses or in prices.14 In previous articles, we have shown that the large differences that exist across U.S. regions in health care spending at the end of life are unrelated to differences in case mix14 or patients’ preferences regarding their care.15 We calculated the End-of-Life Expenditure Index for each of the 306 U.S. Hospital Referral Regions (HRRs) for the period from 2001 through 2003 and grouped HRRs into quintiles of increasing intensity of practice.
Figure 1 shows the way in which we defined the study population. We used Medicare enrollment and claims files to identify all persons who were enrolled in Medicare Part A and Part B and who were at least 65 years of age as of January 1, 1999, resided in 1 of the 50 U.S. states or Washington, DC, and changed their place of residence in 2001, 2002, or 2003. A total of 255,264 beneficiaries were identified. To ensure that we had complete follow-up and that the exposure to the health system in each region was consistent over the entire 3-year period after the beneficiary moved, we excluded beneficiaries who died, who enrolled in a health maintenance organization (HMO) or lost their Part B coverage, or who moved more than once. We also excluded beneficiaries who received more than 10% of their claims from outside their place of residence (“snowbirds”); however, including these snowbirds in the analysis yielded similar results (see Tables 1 through 3 in the Supplementary Appendix, available with the full text of this article at NEJM.org).
Medicare beneficiaries were further stratified according to the quintile of the intensity of services provided in the region in which they originally resided and the quintile of the intensity of services provided in the region to which they moved. This classification resulted in 25 subgroups, 1 for each possible combination of the quintiles before and after the move. For some analyses, we simply compared beneficiaries who moved to any HRR with a lower intensity of practice, those who moved to a region within their original quintile of intensity, and those who moved to an HRR in any higher-intensity quintile. As a comparison group, we also included those who did not move outside their HRR.
Our measures include both diagnostic practices (rates of diagnostic testing, imaging rates, and numbers of major chronic conditions) and Hierarchical Condition Categories (HCC) risk scores, which are currently used by Medicare for program payment. We measured rates of diagnostic testing and imaging services by first using the Berenson–Eggers Type of Service Codes (BETOS) to classify physician claims as laboratory tests or imaging services. We then counted the frequency of each type of claim for each beneficiary.
We counted the number of major chronic conditions that were documented in the Medicare physician and hospital claims data for each beneficiary during each of the 2 years before and the 3 years after a move. To reduce the likelihood of including “rule-out” diagnoses, we recorded a diagnosis as present if it was coded on an inpatient discharge abstract or on two physician claims submitted at least 7 days apart. We restricted this analysis to nine serious chronic conditions, on the basis of the work of Iezzoni and colleagues,16 as adapted for use in the 2008 Dartmouth Atlas of Health Care (see Table 7 in the Supplementary Appendix). We calculated HCC risk scores with the use of the HCC coding algorithms that are used by the Centers for Medicare and Medicaid Services (CMS) to adjust payments for Medicare Advantage plans.17
Unless otherwise specified, all the data presented in the tables are simple counts, means, and proportions. We estimated the average effect of a move to a region that was one quintile higher in intensity, using regression models in which the dependent variable was the beneficiary’s final number of diagnoses or HCC risk score and the independent variables were age, race, sex, the original number of diagnoses or HCC risk score, and the change in the number of quintiles up or down they moved, which could range from −4 to +4. We carried out sensitivity analyses in which we included and excluded snowbirds and stratified the moves across all five levels of intensity. Results were similar in each case. Finally, to further determine potential differences in health status at the time of the move, we compared 1-year and 3-year rates of death among beneficiaries who moved to regions with a higher intensity or a lower intensity of practice. Confidence intervals were calculated with the use of the bootstrap method (Stata software).
Table 1 shows the baseline characteristics of all the Medicare beneficiaries who were eligible for inclusion and who had complete follow-up, stratified according to the intensity of services provided in their original HRR of residence and, if they moved, the intensity of services in the region to which they moved. Residents of higher-intensity regions generally had more office visits, underwent more diagnostic tests, had a higher number of diagnoses, and had higher risk scores than did residents in lower-intensity regions. The key analysis, however, involved determining whether there was a difference in these factors when persons who originally resided in a region with a given quintile of intensity moved to a region with a higher or lower quintile of intensity or to another region with the same quintile of intensity. With few exceptions, notably in quintile 5, among the beneficiaries who moved, the number of diagnoses, the risk scores, and the number of diagnostic tests and imaging services were similar among persons within each quintile during the period before their move.
Table 2 shows the percent increases in laboratory tests, imaging services, and number of diagnoses among beneficiaries who did not move and among those who moved to a new HRR, stratified according to the intensity of services in the region to which they moved. Among both those who moved and those who did not move, there was a strong secular trend toward increased rates of diagnostic testing and a greater prevalence of diagnosed conditions. For example, among persons in quintile 2 who did not move, the number of diagnoses rose by 63.9% (95% confidence interval, [CI], 63.3 to 64.5); among those who moved to the less intensive quintile 1, the number of diagnoses increased by 52.7% (95% CI, 40.5 to 64.9), whereas among those who moved to a more intensive region (quintiles 3 through 5), the number of diagnoses increased by 91.3% (95% CI, 80.3 to 102.4). The results were similar for imaging and laboratory tests.
The percent increase in HCC risk scores, stratified according to the original quintile of intensity and the quintile of intensity of the region to which the beneficiary moved, is shown in Figure 2. Moving to a region with a higher intensity of practice was consistently associated with a greater increase in risk scores. Similar patterns were observed in each of the five quintiles.
We also estimated the average effect on the number of diagnoses of moving to a region that was one quintile of intensity higher. Receiving care in a quintile of intensity that was one step higher was associated with a 5.9% increase in the HCC risk score (95% CI, 5.2 to 6.7). We restricted the analysis to beneficiaries who moved within each of the nine major U.S. census regions and found the same pattern of greater increases in the number of diagnoses and in risk scores among those who moved to higher-intensity regions (see Tables 8 through 10 in the Supplementary Appendix). Finally, we compared 1-year and 3-year rates of death after the move, adjusting for age, sex, and race. The relative risk of death at 1 year among beneficiaries who moved to a region with a higher intensity of practice and among those who moved to a region with a lower intensity of practice, was identical: 0.94 (95% CI, 0.90 to 0.98). At 3 years, there was no evidence of a survival benefit among those who moved to higher-intensity regions as compared with those who moved to lower-intensity regions (Table 11 in the Supplementary Appendix).
Beneficiaries who moved to quintile 5 regions had risk scores that were, on average, 19% higher than those of beneficiaries who moved to quintile 1 regions (see Table 6 in the Supplementary Appendix). Adjusting for such an increase in HCC scores would reduce the 1-year rate of death by 15%, as compared with the unadjusted rate of death.
Among all Medicare beneficiaries, residence in regions of the United States that have a higher intensity of services is associated with a higher reported prevalence of common chronic illnesses. Whether this is due to a higher disease burden or to differences in diagnostic practices in high-intensity regions (or both) was unknown. To address this question, we followed Medicare beneficiaries for 2 years before and 3 years after a move and found that a move to a region with a higher intensity of practice as compared with a move to a region with a lower intensity of practice was associated with greater increases in diagnostic testing, the number of recorded chronic conditions, and HCC risk scores, with no apparent survival benefit.
This study extends previous research on variations in diagnostic practices. Earlier studies have documented variations in the use of diagnostic tests among individual physicians and office practices11 and variations across regions in physicians’ reports of their likelihood of ordering tests.12 Higher rates of diagnostic testing lead to an increased number of diagnoses of specific clinical conditions, such as prostate cancer, thyroid disease, or vascular disease.18–20 The biases introduced by greater diagnostic intensity have been well documented in the context of cancer.21,22 Our study builds on this earlier work.
This study has several limitations. We cannot be certain that the beneficiaries who were recorded as having received a new diagnosis actually had the disease. Given the increased rates of imaging and laboratory testing, however, at least some of the additional diagnoses are likely to have indicated newly detected conditions. We also cannot rule out the possibility that there were differences in the beneficiaries’ underlying health status at the time of the move. However, the similarities in the number of diagnoses and in rates of testing before the move, as well as in the relative risk of death 1 year after the move, are consistent with the view that the underlying health status of the beneficiaries was similar for those who moved to regions with more intensive practice and those who moved to regions with less intensive practice.
Our study has not adequately assessed the effect of the regional differences in diagnostic intensity on the health outcomes for beneficiaries. Several previous studies involving specific clinical cohorts have shown no evidence of a survival benefit when care is provided in regions13,23 or hospitals24 with a higher intensity of services. One study showed that there was a small benefit with respect to survival when the intensity of life-sustaining treatments was greater,25 and a study involving six hospitals showed that among patients with heart failure, the rate of death was lower in the hospitals that used more resources in caring for patients than in those that used less.2 The analyses in all these studies adjusted the data for patients’ diagnoses as coded from provider data and thus risked overadjustment in higher-intensity regions and hospitals (and conversely, underadjustment in lower-intensity regions or hospitals). Although our study did not show a significantly higher rate of survival among beneficiaries who moved to regions with higher-intensity practices, this result should not be interpreted as implying that greater diagnostic intensity offers no benefits. Rather, it underscores the need for research to determine the specific clinical settings in which greater diagnostic intensity does — or does not — confer a benefit.
Our findings nonetheless have implications for health care reform. Comparative-effectiveness studies could be biased by the well-documented differences in diagnostic intensity across hospitals.26 Under public reporting programs, patients may be subject to harm to the extent that their own choices or their physicians’ referrals are based on biased risk-adjusted quality measures. Capitation systems and bundled payments for episodes of care could also be distorted. The differences are not likely to be trivial, on the basis of our analyses. By the end of the study, beneficiaries who had moved to quintile 5 regions (those with the highest intensity of practice) had risk scores that were, on average, 19% higher than those of beneficiaries who had moved to quintile 1 regions (those with the lowest intensity of practice). Since patients had similar baseline health status, these differences are plausible estimates of the differences in diagnostic intensity across these regions. Under a public reporting or payment program that relied on the unmodified HCC scores, capitated reimbursement rates would be as much as 19% higher in the high-intensity regions solely because of bias related to diagnostic practice, particularly since the CMS has been relying to a greater extent on HCC scores in adjusting payments to Medicare Advantage plans.27 In addition, our results suggest that when HCC scores are overestimated by 19%, risk-adjusted rates of death would appear to be 15% lower.
We recognize that biases related to diagnostic intensity are not the only challenge confronting risk adjustment. A major concern about both payment reforms and performance-measurement initiatives is their potential for adversely affecting behavior. For example, if providers are more highly compensated for treating patients with more diagnoses, they could conceivably be inclined to perform more intensive screening and diagnostic testing, with clear effects on costs and uncertain effects on health outcomes. Alternatively, risk-adjustment models could fail to account for the difficulty of caring for truly high-risk patients or those whose care is made more difficult owing to challenges such as language barriers, poor health literacy, or lack of social support, encouraging some providers to avoid or stop providing care for such patients. Such concerns only underscore the importance of continued efforts to advance the development of unbiased methods of risk adjustment as health care reform proceeds.
These challenges could become more manageable as comprehensive electronic health records are implemented. To help improve risk adjustment, such systems would need to incorporate both nonclinical factors that may predict a patient’s lack of adherence to clinical advice (e.g., homelessness or poverty) and clinical data that are less subject to bias that is due to differences in diagnostic practices. Examples of such data include stage and grade in the case of patients with cancer and ejection fraction in the case of those with congestive heart failure. It is also possible that measures of health risks reported by patients (e.g., smoking and exercise patterns) and functional status (physical, social, and role function) could be incorporated in risk-adjustment models to improve their performance.
The newly passed health care reform legislation includes substantial increases in funding for comparative-effectiveness research programs and establishes major initiatives that will move Medicare and Medicaid toward bundled payment systems. Our findings underscore the need for additional efforts to advance risk-adjustment methods as reform proceeds.
Disclosure forms provided by the authors are available with the full text of this article at NEJM.org.