|Home | About | Journals | Submit | Contact Us | Français|
Using Pennsylvania Medicare claims from 1995 and 1996 we previously reported that anesthesia procedure length appears longer in blacks than whites. In a new study using a different and larger data set, we now ask whether body mass index (BMI), not available in Medicare claims, explains this difference; we also examine the relative contributions of surgical and anesthesia times.
The Obesity and Surgical Outcomes Study of 47 hospitals throughout Illinois, New York, and Texas abstracted chart information including BMI on elderly Medicare patients (779 blacks and 14,596 whites) undergoing hip and knee replacement and repair, colectomy, and thoracotomy between 2002–2006. We matched all black Medicare patients to comparable whites and compared procedure lengths.
Mean BMI in the black and white populations was 30.24 kg/m2 and 28.96 kg/m2, respectively (P < 0.0001). After matching on age, sex, procedure, comorbidities, hospital, and BMI, mean white BMI in the comparison group was 30.1 kg/m2 (P = 0.94). The typical matched pair difference (black-white) in anesthesia (induction to recovery room) procedure time was 7.0 min (P = 0.0019), of which 6 minutes reflected the surgical (cut-to-close) time difference (P = 0.0032). Inside matched pairs where difference in procedure times was > 30 min between patients, blacks more commonly had longer procedure times (Odds = 1.39, P = 0.0008).
Controlling for patient characteristics, BMI, and hospital, elderly black Medicare patients experienced slightly but significantly longer procedure length than their closely matched white controls. Procedure length difference was almost completely due to surgery, not anesthesia.
In previous work studying hospitals and patients in Pennsylvania, we reported that operative procedure length as measured through Medicare anesthesia claims appears longer in blacks than whites undergoing the same general and orthopedic surgical procedures.1 After adjusting for the surgical procedure, patient comorbidities and hospital, we previously observed a 5.5-min black-white difference in procedure length (95% CI 3.8, 7.1), P < 0.0001. This difference varied by hospital; some institutions displayed a 16-min difference while others displayed no difference at all.1 These observations had a number of possible explanations, with unobserved confounding being an important consideration.
This report aims to address some weaknesses of the previous study. As Medicare claims did not record body mass index (BMI), we could not directly determine if differences in BMI by race were causing the observed racial differences in procedure length reported in our previous work. Also, our previous work did not closely adjust for secondary procedures. It furthermore used a regression approach that made assumptions about the form of the model used to estimate the racial differences. Lastly, our previous work did not examine the implications of the 5.5-min average disparity when the disparity may not be uniform over patients.
Using a new data source that includes chart abstraction as well as Medicare claims, we now ask whether there remains a difference in procedure time. In the present study we carefully match on BMI (an unobservable in our past analysis); estimated procedure length that takes into consideration types of secondary procedures; comorbidities; source of admission; and the hospital where the procedure was performed. We also examine whether the apparent disparity is associated more with the anesthesiology team or the surgical team by asking if the disparity occurs during the cut-to-close period of the procedure, or before and after that period. Finally, we address the clinical importance of the size of the procedure length disparity observed in our analysis.
This study reports on racial differences in procedure length utilizing a special data set developed to examine the influence of obesity on surgical outcomes. It differs from our earlier work on racial disparity in procedure time1,2 in three important ways: (1) We use multivariate matching to compare racial differences in procedure length controlling for the hospital, rather than using m-estimation (a form of regression used in our previous work). The advantage of this approach is that we do not need to make assumptions about model form; (2) We concentrate on just five categories of surgery (vs. 40 in our previous work), so we can more precisely adjust for differences in procedure type; and (3) We augmented the typical Medicare claims data with chart-derived BMI information, and recorded both anesthesia time (induction to recovery room)2,3 and surgical time (cut-to-close).2,3
As previously described3–6 the Obesity and Surgical Outcomes Study is comprised of 47 hospitals throughout Illinois, New York, and Texas where Medicare claims data was merged with chart abstraction for general and orthopedic procedures. Medicare patients with ages 65 through 80 were identified undergoing one of five types of surgery between 2002 and 2006: (1) hip replacement or revision excluding fracture (International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) Principal Procedure codes 81.51–81.53)); (2) knee replacement or revision (ICD-9-CM Principal Procedure 81.54, 81.55); (3) colectomy for cancer (ICD-9-CM Principal Procedure codes 45.7–45.79, 45.8) (ICD-9-CM Principal Diagnosis codes 153–153.9, 154–154.8, 230.3–6); (4) colectomy not for cancer (ICD-9-CM Principal Procedure 45.7–45.79, 45.8) and (ICD-9-CM Principal Diagnosis codes 562.1–562.13); and (5) thoracotomy (ICD-9-CM Principal Procedure codes 32–32.9).
Hospitals were contacted by the Oklahoma Foundation for Medical Quality, and requested to abstract between 300 and 400 prespecified charts in order to collect baseline information including BMI, admission vital signs and laboratory tests, and information on the surgical procedure. All data collected were deidentified and merged with encrypted Medicare claims files and sent to the study investigators for analysis. Approval was obtained from The Children’s Hospital of Philadelphia Institutional Review Board (the Institutional Review Board associated with the Principal Investigator of the study), as well as hospital-specific Institutional Review Boards when requested.
The Obesity and Surgical Outcomes Study included 15,914 elderly surgical patients in Medicare of which 779 (4.9%) were black. For each patient we obtained procedure length through chart abstraction, using Medicare claims when occasional chart data elements were missing.3
Matching was performed using the algorithm MIPMatch.7 We performed two matches, one without BMI and one with. Both matched exactly on hospital and procedure group, so that each pair of patients (one black and one white) were matched inside the same hospital and had exactly the same procedure group. MIPMatch allowed us to force black and white matched groups to have nearly the same frequencies of patient comorbidities, ICD-9-CM procedures within procedure groups and gender, (a requirement known as “near fine balance”).8–10 Subject to the requirements of an exact match for procedure group and hospital, plus near-fine balance for comorbidities and procedures, we minimized the total distance within matched pairs, a requirement known as optimal matching.8 The matches achieved an exact match on ICD-9-CM principal procedure in over 95% of all pairs (see Appendix, Supplemental Digital Content 1).
The distance included age, sex, procedure, comorbidities such as diabetes, heart failure, previous myocardial infarction, and arrhythmias (see Appendix, Supplemental Digital Content 1), a propensity score for black race, a risk score11 for death, and a predicted time score. The second match controlled for all of these variables and also included BMI. We used the propensity score as one of many variables to match on. Specifically, we found whites with propensity scores similar to blacks. It has been shown that when matching on the propensity score, one will also tend to match on the independent variables making up the propensity score.12–14 Unlike the propensity score that is computed with the study data, the risk score must be computed with an independent sample of patients outside the study population, in order to be able compare outcomes after the matching process.11
The time score, like the risk score, was also based on a regression model to predict a patient’s procedure time given their principal and secondary procedures, but not race. Since time is continuous and the measure of procedure length from claims occasionally has large error (see Reference 3), we used m-estimation.15,16 As with the risk score, we fit the time score model only on patients who were not part of the study population, because when we match on this variable, we want to compare times across black and white matched patients.
A second match was performed that was similar to the first, but added BMI as a matching variable describing each patient. In this way, we asked whether the procedure time difference we previously observed between blacks and whites would persist when we found white patients with very similar BMIs to their matched black patients. If the disparity in procedure length vanished, this would have suggest that differences in procedure length between blacks and whites previously reported were due to BMI differences, and not due to other potential causes of interest to policy makers concerned with disparities.
All data except BMI and procedure length was obtained from Medicare claims. BMI was abstracted from the chart, and procedure length will be reported based on our “best estimate” analysis3 which uses a measure of procedure length that combined the anesthesia bill with the abstracted length to produce a “best” measure of procedure length as described in our previous work.3 Using only anesthesia claims produced very similar results for anesthesia procedure length (not shown) but claims do not include BMI or surgical cut-to-close time.
As has been suggested by Rubin and others,17,18 matching was performed first, without viewing outcomes. After matches were completed, we then analyzed procedure time differences across blacks and whites.
Balance on observed variables after matching was appraised using standard two-sample checks that contrast achieved balance with the magnitude of covariate balance anticipated from completely random assignment.19 For each matching variable we report the “standardized difference” for group comparisons before and after matching, which represents the standardized mean difference between groups, using the standard deviation of the pooled cases and controls.19,20 For example, the standardized difference for age would be calculated as follows, where μage,black and μage,white are the mean ages of the black cases and matched white controls; s2age,black and s2age, all white are the variances of the black cases and all white potential controls. The Standardized Difference is then (μage,black − μage,white) divided by the square root of [(s2age,black + s2age, all white)/2]. A usual rule of thumb is to try to achieve Standardized Differences below 0.2, or a fifth of a standard deviation.12,19–21
We also compared covariate balance attained by matching with the covariate balance anticipated from complete randomization using two-sample randomization tests, specifically the Wilcoxon rank sum test for continuous covariates and the Fisher exact test for binary covariates.
When testing the hypothesis of no difference in outcomes between the matched black and white patients, the widely used Wilcoxon sign-rank statistic22 was calculated, together with its corresponding confidence interval and point estimate, the so-called Hodges-Lehmann estimate.22 Also reported is the median. For binary outcomes, the McNemar statistic23 was used. The Kruskal-Wallis test was used to examine differences in procedure time across procedure groups.22 Findings were considered significant if P < 0.05 (two-sided). We utilized the software package R for all statistical tests.* (R is available as Free Software under the terms of the Free Software Foundation’s GNU General Public License in source code form.)
Results on the quality of the two matches are displayed in table 1. We first present the quality of Match 1, the match that included age, sex, hospital, procedure, comorbidities, a risk score, a propensity score to be black, and a time score, but did not include BMI. We then present Match 2, a similar match that also includes BMI. As can be seen, the quality of the match did not vary with or without matching on BMI. Furthermore, the quality of the matches was uniformly excellent, as demonstrated by standardized difference after matching, which were all considerably less than 0.10 standard deviations for all variables and statistically insignificant.
In table 2, for both matches (Match 1 without matching on BMI and Match 2 including BMI), we examine the differences in length of procedure as defined by (1) A best estimate of anesthesia procedure time based on the chart and Medicare claim (induction to recovery room);1–3 (2) Surgical procedure time as determined from the chart (this is the cut-to-close time);1–3 and finally, (3) Anesthesia Induction/Emergence time which we define as the difference between Anesthesia Procedure Time (1) and Surgical Time (2). For each procedure time reported we provide the mean, median and the Hodges-Lehmann statistic which provides a measure of the typical time for each group, as well as a 95% confidence interval.
The time differentials between black and white Medicare patients were similar across matches, and all differences in time between black and white matched pairs were highly significant. For example, using the match that did not include BMI, and our best estimate of the anesthesia procedure time based on claim and chart, we observed that black patient procedures took about 6.5 (95% CI 2 to 10.5) minutes longer than matched white patients, controlling for the same hospital, procedure, and comorbidities. When we include BMI into the matching algorithm (Match 2) we observe a similar finding of a 7-min difference (95% CI 2.5 to 11.5 min). We further examined whether procedure groups (knee, hip, colectomy for cancer, colectomy not for cancer, and thoracotomy) displayed different patterns of disparity. The black-minus-white pair differences in time were not significantly different among the five procedure groups (Kruskal-Wallis test P = 0.9319 in non-BMI matched pairs; 0.4784 in BMI matched pairs). We next asked what would the disparity have been if we only exact matched on principal procedure, and no other variables. The Hodges-Lehmann estimate for the difference between blacks and whites in anesthesia time was 12.5 min (95% CI 7–18), P < 0.0001), surgical time difference was 10.5 min (5.5, 15.5), P < 0.0001, and Induction/Emergence time difference 1.5 (0, 3.5) P = 0.0863. Finally, the Obesity and Surgical Outcomes Study was not designed to make comparisons among hospitals, and the number of blacks in individual hospitals is too small to yield much statistical power for such a comparison. We did take the black-minus-white matched pair differences in operative time and compare them among hospitals using the Kruskal-Wallis test, and by this test the black-white difference in operative time did not differ significantly among hospitals (P-value = .53 in the match with BMI, P-value = .72 in the match without BMI); however, it is difficult to know what to make of failing to find a difference when the power is low.
When we examined racial differences in surgical time (cut-to-close time), and the anesthesia induction/emergence time (the difference between anesthesia procedure time and surgical time) we see that racial differences in surgical time almost equal the differences in anesthesia procedure time (measuring induction to recovery room). For example, in table 2, using Match 2 with BMI, the Hodges-Lehmann estimate for anesthesia procedure time difference was 7 min, in contrast to 6 min (95% CI 2 to 9.5 minutes) for surgical time. The Hodges-Lehmann estimate of the black-white difference in anesthesia induction/emergence time, calculated from individual pairs as the difference between anesthesia time and surgical time, was 1 min (95% CI −0.5 to 3 min).
One may reasonably ask whether a typical difference in procedure time of 7 min between black and white patients, though statistically significant, is clinically important. To better understand the implications of a time differential, we asked whether blacks were more likely to have longer procedure times than whites over various ranges of procedure time differences. For example, it may be the case that there is always a gap between black and white patients, and that the typical 7-min gap is distributed rather evenly across patients. On the other hand, it may be the case that generally there is little difference between black and white procedure time, but in situations where there are large differences, blacks have longer times. This would be a more concerning pattern from a clinical perspective, suggesting that the average 7-min difference potentially signals a more important clinical problem.
To study this question we first examined the unpaired distributions of procedure length in black and white patients. Figure 1 consists of two quantile-quantile plots24 (one for each match) of black and white procedure times, omitting two extreme patients from each plot for display purposes. Points on the line of identity would suggest black and white quantiles were similar. We found that black patient procedure lengths were longer than white patients as the procedure lengths increased, suggesting that the typical black-white difference of 7 is not distributed evenly across patients.
We then rank ordered all black-white matched pairs by the absolute value of the difference in procedure length inside each pair. Some pairs had small differences, others large. For categories of absolute difference (between 0 and 10 min, 10 and 30 min and greater than 30 min), we report in table 3 the odds that the member of the pair with the longer time was a black patient (as compared to a white patient). For the match that did not include BMI, the overall odds was 1.22 (95% CI 1.06, 1.41), P = 0.0067. When time differences between members of a pair were greater than 30 min, the odds that the longer patient was black was 1.29 (1.07, 1.56), P = 0.0070. Similarly, when BMI was included in the match the overall odds that the black patient had the longer procedure time was 1.23 (1.07, 1.42), P = 0.0047. When time differences between members of a pair were greater than 30 min, the odds that the longer patient was black was 1.39 (1.15, 1.69), P = 0.0008. In short, the 7-min typical gap between black and white matched pairs translated into an increased odds that blacks have large (i.e., > 30 min) differences in procedure length than their matched white controls.
In our previous research using Medicare claims1 we observed that the average difference in procedure time between black and white patients was 5 to 7 min, and for some hospitals the difference was considerably greater. In that study we used Medicare claims, and consequently could not adjust for the influence of BMI. However, obesity may increase the length of a procedure,1,25–27 and can be a challenge for both the anesthesia care team and the surgeon.28–32 It would have been reassuring, from a disparities perspective, if BMI differences between blacks and whites could explain the difference in procedure time. However, they did not.
Our present study on a different population of Medicare patients, using matching instead of regression, and including chart review to obtain BMI, reports a disparity similar in magnitude to our earlier report. While the typical disparity of 7 min does not seem large, the processes that lead us to be able to observe these significant differences may be very different for blacks and whites. Any conclusions depend on a careful comparison of black and white patients, the procedures they had, and potential confounding factors not adjusted for in our previous work, such as BMI. Furthermore, the 7-min gap translated into a 29–39% increase in the odds that when differences of greater than 30 min occurred inside a matched pair, it was the black patient with the longer procedure.
Medicare claims do not include BMI, so if black-white differences in BMI explained differences in procedure times, then Medicare data could not be used to study disparities in procedure times. Table 1 shows (i) the black-white difference in mean BMI was not large prior to any matching, (ii) matching for Medicare comorbidities, surgical procedures and hospital removed about half of the small initial difference in BMI without using BMI. Our later results show that completely adjusting for BMI as measured by chart abstraction did not remove the black-white disparity in procedure times. We conclude that Medicare claims can be used to study racial disparities despite the absence of BMI in Medicare claims.
As can be seen from table 1, both populations were remarkably similar in composition. Importantly, our matching produced almost exactly similar predicted procedure times. The predicted times were virtually the same between blacks and whites, yet observed differences after matching were about 7 min.
Studying BMI and chart time information provided a window into what were unobservable variables from our previous study. In the present study we have shown BMI did not explain why black procedure length is greater than white procedure length. One interesting clue as to why (or when) the disparity occurred did emerge from table 2. We observed that 6 min of the 7-min gap seen in the match that included BMI was found in the cut-to-close surgical time interval. The typical black-white difference in anesthesia induction/emergence time in this example was only 1 min based on the Hodges-Lehmann point estimate. In other words, almost all the time differential between blacks and whites is found during surgical procedure time. The contribution to the racial disparity from the anesthesia team was small and not statistically significant.
Finally, our study sheds light on the importance of the typical 7-min gap between black and white patients. From a clinical perspective, a 7-min gap may be less concerning if distributed evenly across all pairs, but this was not the case. Instead, from table 3 we see that when there were relatively small differences in procedure time between blacks and whites, there was no significant increase in the odds of blacks or whites experiencing the longer procedure. However, for matched pairs where there was greater than a half hour difference in procedure length between patients, the odds that the black patient had the longer time than the matched white control were significantly elevated suggesting that the longer procedure length was associated more often with the black patient than the white patient. Our results suggest that usually there is little difference in procedure time, (as seen in fig. 1 and table 3) but when there is a disparity, the problem is a considerable one. The reported typical difference of 7 min for all blacks must imply that for the typical black patient experiencing the disparity, the procedure time difference from whites is much larger than 7 min, since most blacks did not experience a disparity.
A limitation true for any observational study is the possibility that unobserved factors may have accounted for the finding of interest. As such, in our original Surgical Outcomes Study paper, the finding of procedure length differences by race1 may have been influenced by some unobservables. The present report asks if the adjustment method was improved, and some unobservables become observable, would the racial differences in procedure length disappear. The present study improves upon the previous study in four ways: (1) The study examines procedure length in three states not studied before – Texas, Illinois, and New York. It therefore represents an independent sample on which validation of the previous finding could be made, and indeed, we found the disparity to be similar; (2) We now adjust for some variables that were unobservables in the previous study. Most important is the addition of BMI to the adjustment, which was collected through chart review – something not possible in the past study. We also match on time score; (3) A third important improvement in the present study is the use of multivariate matching rather than regression, thereby not requiring assumptions regarding the form of a model used for adjustment. We observed excellent matches, as reported in table 1. Using this method we could also match on a propensity score, a procedure time score, BMI and other variables simultaneously, in part because we had a very large pool of whites in which to find matches to the black population. Finally, (4) through chart review we could examine whether the procedure length differences were occurring between cut and close, or between induction to cut and close to recovery room. This analysis was impossible in the previous study because procedure length was based only on the anesthesia claim, not chart review.
Still another factor that may influence our results concern variables associated with race that may also influence the disparity in procedure time, such as income and education. In previous work1 we did observe an income effect, in that we observed less of a disparity in higher income blacks. In this study we did not match on income, as income is highly correlated with race33 and it was not our intent to disentangle that connection. If the mechanism for procedure time disparity was income, education or race, the disparity would be equally interesting in this Medicare population, as all had insurance and went to the same hospital.
The implications of identifying a clinically relevant disparity in procedure length are complex. If the disparity is real, after adjusting for patient characteristics that may prolong procedure length, then we are forced to ask why this difference is occurring. It has been shown that when procedures are performed by resident surgeons, they are longer than when performed by attendings.34,35 If blacks had a higher risk of receiving care from residents or inexperienced surgeons, this may possibly account for some of the time differential.36 Because Medicare data does not provide a separate bill for the resident surgeon, we cannot directly know whether the prolonged case was due to teaching. Furthermore, we do not know whether specific key portions of the surgical procedure were performed by the attending or the resident. Who is holding the scalpel and who is holding the clamp is impossible to know short of videotaping all procedures. However, if the etiology is differential surgical experience, making departments aware that the process of surgical selection is leading to this pattern may aid in producing a more equitable system. Determining the cause or causes of this disparity is beyond the scope of this report.
In conclusion, the racial differences in procedure time in the Medicare population were significant even after adjusting for BMI. Furthermore, the observed time differences appear to occur almost entirely during the surgical cut-to-close period, not the induction-to-emergence period. When differences inside matched pairs exceeded 30 min, the black patient was significantly more likely to be the one with the longer procedure time. This remaining significant difference in procedure time between black and white patients requires our attention.
Funding Source: This work was funded by National Institute of Diabetes and Digestive and Kidney Disease, Bethesda, Maryland (Grant #R01-DK073671), Agency for Healthcare and Research Quality, Rockville, Maryland (Grant #R01- HS018355).
*R Development Core Team: R: A Language and Environment for Statistical Computing. Vienna, Austria, R Foundation for Statistical Computing, 2012, 2011. http://www.R-project.org. Last accessed February 6, 2013.