|Home | About | Journals | Submit | Contact Us | Français|
The Centers for Medicare & Medicaid Services uses kidney transplant outcomes, unadjusted for standard comorbidity, to identify centers with sufficiently higher than expected rates of graft failure or patient death (underperforming centers) that they may be denied Medicare participation. To examine whether comorbidity adjustment would affect this determination, we identified centers that would have failed to meet 1-year graft survival criteria, 1992–2005, with and without adjustment using the Elixhauser Comorbidity Index. Adjustment was performed for each US center for 24 consecutive (overlapping) 30-month intervals, including 102,176 adult deceased-donor and living-donor kidney transplant patients with Medicare as primary payer 6 months pretransplant. For each interval, we determined percent positive agreement (number of centers underperforming both before and after adjustment, divided by number underperforming either before or after adjustment). Overall percent positive agreement was 80.8%, with no evidence of a trend over time. Among deceased-donor recipients, 10 of 31 comorbid conditions were predictors of graft failure in at least half of the intervals, as were 6 conditions among living-donor recipients. Lack of comorbidity adjustment may disadvantage centers willing to accept higher risk patients. Risk of jeopardizing Medicare funding may give centers incentive to deny transplantation to higher risk patients.
Outcomes after organ transplantation in the US are receiving increasing scrutiny. On March 30, 2007, the Centers for Medicare & Medicaid Services (CMS) published new Conditions of Participation, which took effect June 28, 2007 (1). A key element of these Conditions requires transplant centers to meet minimum standards for patient and graft survival. The standards are based on center-specific, risk-adjusted outcomes, which are calculated and published every 6 months by the Scientific Registry of Transplant Recipients (SRTR) (2). Private payers also use published SRTR center-specific outcomes to determine whether they will pay for transplants at individual centers (3–5).
Because patients and payers use SRTR center-specific outcomes to select centers to perform transplants, these outcomes must be accurately risk-adjusted. Without appropriate adjustment for prevalent comorbidity at the time of transplant, outcomes at centers that perform transplants in higher risk patients may appear to be worse than expected, and such centers may thus be prohibited from performing organ transplants. Centers may have strong incentive to perform transplants in lower risk patients, so their outcomes will appear more positive in published data. This could lead to de facto rationing of organ transplants, and could create a barrier to transplantation for selected higher risk patient groups. The SRTR calculates expected center outcomes using patient, donor, and transplant characteristics; this information is collected by the Organ Procurement and Transplantation Network (OPTN), which includes all transplant centers in the US. However, the data include little information generally included in standard comorbidity indices (6).
We hypothesized that the center-specific outcomes published by the SRTR would be significantly different if they were adjusted for a standard set of comorbid conditions. We used CMS claims data to determine whether adjusting for comorbidity would alter the results as they are currently calculated by the SRTR.
Data were from the United States Renal DataSystem (USRDS), and included CMS claims data and OPTN registry data. Between January 1, 1992, and December 31, 2005, 179,567 adults (aged > 18 years) underwent deceased- and living-donor, kidney-only transplants. We excluded pediatric patients and patients who underwent simultaneous transplant with an organ other than a kidney. Because we identified prevalent comorbidity at the time of transplant using CMS claims, the analysis included only the subset of 102,176 (56.9%) patients who had Medicare as primary payer during the 6 months pretransplant. Medicare status was determined from USRDS payer data, which is based on information from the CMS Medicare Enrollment Database and Medicare outpatient dialysis claims.
From the OPTN registry, we identified patient and transplant characteristics currently used in SRTR analyses. Characteristics differed for deceased- and living-donor recipients (first 2 columns, Table 1). We also identified transplant centers from the OPTN registry. For patients with no history of dialysis or kidney transplant in the OPTN data, we calculated duration of ESRD as time from ESRD incidence (USRDS data) to transplant (OPTN data).
We identified 31 comorbid conditions from CMS claims, including the 30 Elixhauser (7) conditions and cardiac arrhythmias (third column,Table 1). Diagnosis codes for each condition were taken from Quan et al (8). For each condition, prevalence at the time of transplant was defined by at least 1 Medicare Part A inpatient claim or at least 2 Part A outpatient or Part B claims with a relevant diagnosis code during the 6 months pretransplant.
To replicate SRTR methodology used to determine the expected number of graft failures during the first year posttransplant, we examined 24 consecutive rolling transplant patient cohorts, each consisting of transplants performed during 30-month intervals, beginning January 1, 1992. The first cohort included all deceased- and living-donor recipients who underwent transplants during the 30-month period from January 1, 1992, through June 30, 1994. The second cohort included patients who underwent transplants between July 1, 1992, and December 31, 1994. The final cohort included patients who underwent transplants between July 1, 2003, and December 31, 2005.
For each cohort, we fitted Cox proportional hazards models of graft failures. Separate models were fit to deceased- and living-donor recipients. Graft failure was defined as death with a functioning graft, return to dialysis, or re-transplant. For each model, follow-up began at transplant date and continued until the earliest of graft failure or 1 year posttransplant. Deceased- and living-donor models were adjusted for all applicable variables listed in Table 1. All variables were parameterized exactly like the SRTR analysis of kidney-only transplants in adults performed between January 1, 2005, and June 30, 2007.
After fitting deceased- and living-donor models with the Table 1 variables, we re-fitted the models with adjustment for pre-existing recipient comorbidity at the time of transplant. For each model, we included only those comorbid conditions that passed through a backwards variable selection algorithm, in which the patient, donor, and transplant characteristics were always retained in the model, and the significance criterion for inclusion was set to P = 0.10.
The expected number of graft failures per center was calculated from survival estimates derived from the Cox model. From each fitted model, we obtained an estimate of survival at the end of follow-up for each patient. We calculated the expected number of graft failures per patient from the negative of the logarithm of this survival estimate. The expected number of failures per center was calculated from the sum of the expected number of failures per patient across all patients at that center, including living- and deceased-donor recipients.
Finally, we applied the 3 individual performance criteria used by CMS to identify underperforming centers: (1) observed (O) – expected (E) events > 3; (2) ratio of O/E > 1.5; and (3) ratio of O/E significantly greater than 1, as indicated by an exact Poisson test with a one-sided P < 0.05. Centers meeting all 3 individual criteria are considered underperforming. We also examined the outcome of meeting any of the 3 individual criteria. We applied these criteria only to centers that performed at least 10 transplants over 30 months, as CMS does not apply them to smaller centers. As the primary outcome, we considered percent positive agreement (PPA) between models adjusted and not adjusted for recipient comorbidity. PPA facilitates examination of discrepancies between these methods in identifying underperforming centers, with the understanding that relatively few centers in any one cohort (statistically) underperform. PPA also obviates the need to declare either method “gold standard,” as is implicit in the calculation of concordance measures such as sensitivity and specificity.
All analyses were conducted with SAS (v9.1.3, Cary, North Carolina).
Between January 1, 1992, and December 31, 2005, 179,567 kidney-only transplant were performed, 66.2% for deceased-donor recipients. Medicare was primary payer in the 6 months pretransplant for 56.9% of all patients. Of these, 75.8% were deceased-donor recipients, compared with 53.6% of patients without Medicare as primary payer in the 6 months pretransplant. Table 2 displays patient and transplant characteristics used for risk adjustment by the SRTR for patients included in (Medicare) and excluded from (non-Medicare) the analysis. Included patients were significantly more likely to have older donors, and African American or Hispanic donors; to have had a previous organ transplant, higher peak PRA, and longer duration of pretransplant ESRD; and to be elderly and African American or Hispanic.
During the first year posttransplant, for all included patients, 11,055 (14.3%) graft failures occurred for deceased-donor recipients, and 1826 (7.4%) for living-donor recipients. In contrast, for all excluded patients, graft failures occurred for 10.4% of deceased-donor recipients and 5.0% of living-donor recipients.
Thirty-one comorbid conditions were considered for risk adjustment in this analysis. For deceased-donor recipients, prevalence of 8 conditions exceeded 10%: congestive heart failure, cardiac arrhythmias, hypertension, diabetes, renal failure, liver disease, fluid and electrolyte disorders, and deficiency anemia (Table 3). Prevalence of most conditions changed significantly from 1992 to 2005, with increases in cardiovascular comorbidity, hypertension, chronic pulmonary disease, diabetes, and deficiency anemia. For living-donor recipients, similar relative magnitudes of prevalence were observed, but prevalence was typically significantly less (P < 0.05) than for deceased-donor recipients. Similar secular prevalence trends were observed, with significant increases over time in cardiovascular comorbidity prevalence.
We assessed the number of cohorts in which each comorbid condition was included (P < 0.10) as a predictor of graft failure. Among deceased-donor recipients, 10 comorbid conditions were independent predictors of graft failure in at least 12 (≥ 50%) of the cohorts. Congestive heart failure (mean hazard ratio [mHR] 1.22), cardiac arrhythmias (mHR 1.38), hypertension (mHR 0.86 uncomplicated, 0.82 complicated), other neurological disorders (mHR 1.49), coagulopathy (mHR 1.34), and fluid and electrolyte disorders (mHR 1.15) were each included in at least 20 cohort-specific models. Among living-donor recipients, 6 conditions were independent predictors in at least half of the cohorts: cardiac arrhythmias (mHR 1.57), peripheral vascular disease (mHR 1.44), uncomplicated hypertension (mHR 0.72), other neurological disorders (mHR 1.68), coagulopathy (mHR 1.73), and fluid and electrolyte disorders (mHR 1.42).
The addition of comorbidity to graft failure models resulted in widespread, significant improvement in discrimination (the ability to distinguish between patients who experience graft failure and those who do not) and calibration (the ability to match predicted with observed graft failure probabilities) (Table 4). For deceased-donor recipients, models included an average of 11.8 comorbid conditions. From the eighth cohort on, the addition of comorbidity resulted in significant improvements in discrimination (P < 0.01), generally around 3%. Improvements in calibration were universal and significant (P < 0.01). For living-donor recipients, models included an average of 7.4 conditions. Improvements in discrimination were universal, but statistically significant (P < 0.01) only during the final third of the study period. During this time, the c-statistic increased by 6% to 10% after inclusion of comorbid conditions. As in models for deceased-donor recipients, improvements in calibration were statistically significant across all cohorts.
In the 24 cohorts, 227 centers were assessed a total of 4865 times. The mean number of transplants per center assessment was 89.4 per 30 months (interquartile range 41–115).
Figure 1a displays a histogram of the percentage difference in the expected number of graft failures per center, before and after adjustment for recipient comorbidity, over all 4865 assessments, and Figure 1b a scatterplot of expected graft failures per center, before and after adjustment. The expected number of failures declined by more than 15% in 51 (1.0%) center assessments, and declined by between 5% and 15% in 1060 (21.8%) assessments. These centers would have been effectively rewarded by risk adjustment that failed to consider recipient comorbidity. In contrast, the expected number of graft failures increased by more than 15% in 174 (3.6%) center assessments, and by between 5% and 15% in 805 (16.5%) assessments. These centers would have been effectively penalized by risk adjustment that failed to consider comorbidity. A substantial number of centers with 20–40 expected graft failures before adjustment for comorbidity exhibited an increase in expected failures after adjustment.
Figure 2 shows the PPA for each of the 3 individual performance criteria across the 24 cohorts. For the first criterion (O – E > 3), PPA between adjustment for comorbid conditions and no adjustment was 87.2% (95% confidence interval [CI] 85.2%-89.2%), with no evidence of a trend across the 24 cohorts (P = 0.935, from a logistic regression of the linear effect of period). For the second criterion (O/E > 1.5), PPA increased significantly (P = 0.024) over time, from an estimated 78.0% in the first cohort to an estimated 87.9% in the final cohort. For the third criterion (one-sided test of O/E = 1), PPA was 81.6% (95% CI 78.2%-84.9%), with no evidence of a trend over time (P = 0.611). Thus, of all centers considered underperforming either before or after adjustment for comorbidity, based exclusively on a statistical test of observed versus expected graft failure, nearly 1 in 5 were discrepantly flagged by the adjustment techniques.
In reality, centers must meet all 3 criteria to be considered underperforming. PPA for meeting all 3 criteria, between adjustment for comorbid conditions and no adjustment, was 80.8% (95% CI 77.2%-84.5%), with no evidence of a trend over time (P = 0.340, Figure 3). The PPA for meeting any of the 3 criteria was 86.4% (95% CI 84.6%-88.3%), with weak evidence for a trend over time (P = 0.063). Table 5 displays discrepancies between the adjustment techniques. By either method, the number of underperforming centers in a particular cohort would have fluctuated from 8% to 9%. However, that approximate equality obscures the fact that many centers would have been marked differently by the 2 adjustment techniques. Over the 24 cohorts, a total of 45 centers would have been declared underperforming only after adjustment for comorbidity, while 42 centers would have been declared underperforming before adjustment, but not after adjustment. While PPA was 80.8% in aggregate, it fluctuated considerably over the 24 cohorts, from a minimum of 64.7% to a maximum of 95.0%. Notably, PPA never achieved 100% in a cohort.
The SRTR center-specific outcomes for kidney transplant are used by patients and payers to judge quality of care and to select centers to perform transplants. It is critical, therefore, that these outcomes be accurate and unbiased. The SRTR adjusts center-specific outcomes for baseline differences in patient, donor, and transplant characteristics with data collected by the OPTN. However, their methods do not adjust for the differences in baseline comorbidity usually used in comparing outcomes between different health care providers. In our study, we recalculated adjusted graft survival outcomes used in the criteria recently adopted by CMS to identify underperforming kidney transplant centers, using comorbidity ascertained from CMS billing claims. Our results indicate that comorbidity is an important predictor of graft failure after kidney transplant, and that adjusting for differences in baseline comorbidity considerably alters which centers are identified as underperforming, according to CMS criteria.
The aggregate PPA of 80.8% could be interpreted to suggest that the effect of lack of comorbidity adjustment is relatively small. However, it implies that, on average, 3 to 4 centers would be incorrectly identified as performing or underperforming every 6 months. Because these may not be the same centers every 6 months, the cumulative effects could be greater. In addition, private payers may use more stringent criteria than CMS for determining their preferred providers. Hence, the number of affected centers could be substantially higher.
That adjusting for comorbidity would affect center-specific graft survival calculations should not be surprising. A large proportion of graft failures are due to death with a functioning graft, so factors that affect mortality could be expected to affect graft survival. However, few studies have examined the effects of comorbidity in kidney transplant patients. Jassal et al (9) compared 4 different comorbidity indices for 6324 transplants in the Canadian Organ Replacement Registry. Specifically, they compared the Charlson Comorbidity Index (10) to 3 indices adapted for use in non-transplant ESRD patients, the Khan Index (11), the Davies Index (12), and a modified Charlson Comorbidity Index (13). They found that all 4 indices correlated with mortality after transplant, and that the Charlson Comorbidity Index was superior (14). In a single-center study, Wu et al (15) also found that the Charlson Comorbidity Index was associated with mortality in 715 kidney transplant patients.
We selected comorbid conditions from the Elixhauser (16) Comorbidity Index, a set of 30 conditions which nearly subsumes the Charlson Comorbidity Index. While these indices have been well-validated for predicting mortality in non-transplant populations (17–23), they have not been validated in administrative transplantation registries. In general, comorbidity indices tend to have high specificity and low sensitivity when applied to administrative data (24). In a study of the National Registry of Atrial Fibrillation, the specificity of Medicare claims for cardiovascular comorbidity exceeded 90%, while sensitivity was much lower (25). Thus, using Medicare claims in the present study likely results in under-identification of prognostically important comorbid conditions. Improved ascertainment of comorbid conditions is likely to strengthen our finding that an important component of differences in outcomes between transplant centers is heterogeneous prevalence of baseline comorbidity across centers. Additionally, diagnosis codes in Medicare claims carry no indication of disease severity. More detailed recording of comorbidity severity is also likely to strengthen our conclusions.
By necessity, we analyzed CMS criteria in a subset of transplant patients with Medicare coverage. Patients in this subset had longer duration of pretransplant ESRD, and were more likely to be older and African American or Hispanic than patients without Medicare coverage, and the subset included a disproportionate number of deceased-donor recipients. Comorbidity prevalence among excluded patients was likely lower than among included patients. Therefore, lack of comorbidity adjustment for excluded patients may plausibly be less important, and the PPA between the adjustment techniques may be modestly higher for all transplant patients than for the subset with pretransplant Medicare coverage.
Importantly, this study was not designed to determine the optimal method for adjusting center performance for comorbid conditions, but rather to demonstrate that such adjustment is likely to have an effect on performance results. In this study, more than 40% of patients carried private insurance, and excluding them from comorbidity-adjusted outcome calculations would likely be unacceptable. A strong argument could be made that the only accurate method of equitably adjusting outcomes for all transplant centers would be for the OPTN to prospectively collect comorbidity data at the time of transplant. In the mean time, strong consideration should be given to suspending the comparison of center-specific outcomes that are not adjusted for comorbidity, given the potential to harm programs and patients.
There is evidence that the level of mortality risk that centers are willing to tolerate in accepting transplant candidates varies, and that risk tolerance correlates with the chances that a center is an underperforming center by CMS and SRTR criteria. In a recent study, Schold et al calculated kidney transplant candidate mortality before transplant (26). They found substantial variation from center to center in pretransplant mortality, indirectly suggesting variation in the threshold of risk that centers will accept. Importantly, of the 19% of centers that met CMS criteria for underperformance, 51% were in the highest quartile of pretransplant candidate mortality, and only 7% were in the lowest. These results are consistent with our results: baseline mortality risk strongly influences designation of underperformance.
Transplant centers do not accept all ESRD patients, and comorbidity influences access to transplants. For example, Gaylin et al reported that ESRD patients with more comorbidity, particularly cardiovascular comorbidity, were less likely to be accepted for kidney transplants than patients with less comorbidity (27). Similarly, Winkelmayer et al (28) reported that ESRD patients with comorbidity were less likely to undergo transplants. However, survival is better for patients accepted for transplants than for those on the waiting list (29). Pressure on centers to improve published outcomes, leading them to deny transplants that could improve survival for higher risk patients, is of concern.
Optimal outcomes and equitable access are laudable goals in transplantation. However, efforts to optimize the performance of transplant centers should not be confused with efforts to optimize organ allocation. We suspect that it is not the intent of CMS to use Conditions of Participation to discourage centers from performing transplants in high risk patients who may benefit from them. Similarly, it may not be the intent of private payers (who select transplant centers for their insured patients) to use SRTR data to encourage centers to perform transplants in lower risk, and less costly, patients. Nevertheless, lack of comorbidity adjustment, along with use of these data by CMS and private payers to define center performance, could encourage centers to seek to improve their outcomes by accepting lower risk patients. This is not to say that centers should accept patients who are too high risk to benefit from transplants, but that the decision should be made in the best interests of patients and not to improve outcome statistics.
Finally, we note that centers have no reason to try to improve their published outcomes by selecting patients based on characteristics currently used to adjust those outcomes. For example, donation after cardiac death and extended criteria donor are currently used in the adjustment of deceased-donor kidney transplant outcomes. Thus, a center’s decision to use kidneys from donation after cardiac death or extended criteria donors should not be influenced by concern that these will negatively affect the center’s outcomes. Only characteristics not used in adjusting these outcomes represent a potential problem.
Based on this analysis, we suggest that any assessment of center performance include adjustment for baseline recipient comorbidity. This analysis indicated that cardiovascular comorbidity, neurological disease, and anemia and fluid disorders common to ESRD carry elevated prognostic significance. Conceptually, baseline comorbidity confounds the association between risk of graft failure and transplant center, so usual conventions for identifying important confounders (non-trivial disease prevalence and clinically meaningful risk of graft failure, given disease) should guide the selection of conditions for adjustment. We also suggest that comorbidity data would best be ascertained prospectively, rather than with administrative data. The associations reported in this analysis would likely be magnified in the presence of a data collection mechanism with superior sensitivity to Medicare claims. Prospective ascertainment would facilitate adjustment for comorbidity in non-Medicare patients. Finally, we note that all statistical assessments of center performance should be interpreted with caution. Comparisons of observed with expected events depend on the quality of the estimate of what is expected. This analysis showed that modest gains in model discrimination can have important consequences for individual centers. Moreover, differences between observations and expectations can arise by chance, and may not reflect transplant center practices.
In summary, the results of this analysis suggest that published SRTR center-specific outcomes, currently used by CMS and other payers to decide where transplants should be performed, do not adequately account for differences in baseline comorbidity across centers. Consequently, these center-specific outcomes may disadvantage centers that are willing to perform transplants in higher risk patients, compared with more restrictive centers. In the future, such centers may be encouraged to deny transplants to higher risk patients to improve SRTR-reported outcomes, even when survival may be improved by providing transplants to these higher risk patients. Payers should reassess how center-specific outcomes are calculated, and how decisions are made to determine which centers should perform transplants.
Portions of this work were presented, and published in abstract form, at the American Transplant Congress, Toronto, Canada, June 4, 2008. The authors thank USRDS colleagues Beth Forrest for regulatory assistance, Shane Nygaard for manuscript preparation, and Nan Booth, MSW, MPH, for manuscript editing.
This work was supported by the United States Renal Data System, under Contract No. HHSN267200715002C (National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland). The interpretation and reporting of these data are the responsibility of the authors and in no way should be seen as an official policy or interpretation of the US government. Dr. Israni is a Robert Wood Johnson Foundation Physician Faculty Scholar. The other authors have no conflicts of interest to report.