|Home | About | Journals | Submit | Contact Us | Français|
Congenital heart surgery outcomes analysis requires reliable methods of estimating the risk of adverse outcomes. Contemporary methods focus primarily on mortality or rely on expert opinion to estimate morbidity associated with different procedures. We created an objective, empirically based index that reflects statistically estimated risk of morbidity by procedure.
Morbidity risk was estimated using data from 62,851 operations in the Society of Thoracic Surgeons Congenital Heart Surgery Database (2002-2008). Model-based estimates with 95% Bayesian credible intervals were calculated for each procedure’s average risk of major complications and average postoperative length of stay. These 2 measures were combined into a composite morbidity score. A total of 140 procedures were assigned scores ranging from 0.1 to 5.0 and sorted into 5 relatively homogeneous categories.
Model-estimated risk of major complications ranged from 1.0% for simple procedures to 38.2% for truncus arteriosus with interrupted aortic arch repair. Procedure-specific estimates of average postoperative length of stay ranged from 2.9 days for simple procedures to 42.6 days for a combined atrial switch and Rastelli operation. Spearman rank correlation between raw rates of major complication and average postoperative length of stay was 0.82 in procedures with n greater than 200. Rate of major complications ranged from 3.2% in category 1 to 30.0% in category 5. Aggregate average postoperative length of stay ranged from 6.3 days in category 1 to 34.0 days in category 5.
Complication rates and postoperative length of stay provide related but not redundant information about morbidity. The Morbidity Scores and Categories provide an objective assessment of risk associated with operations for congenital heart disease, which should facilitate comparison of outcomes across cohorts with differing case mixes.
Contemporary efforts to describe and compare congenital heart surgery outcomes across institutions have evolved to include (1) use of clinical registry data, rather than administrative data from the hospital bill to evaluate outcomes; (2) use of empiric rather than opinion-based models to adjust for differences in case complexity across institutions; and (3) recognition that focusing solely on in-hospital mortality overlooks 96% of patients who survive to hospital discharge and the important morbidities that they may experience.1
In 2009, an empirically based tool for analyzing mortality associated with congenital heart surgery was introduced. The Society of Thoracic Surgeons-European Association for Cardiothoracic Surgery (STS-EACTS) Congenital Heart Surgery Mortality Score and Categories are based on analysis of 148 different types of operations performed in 77,294 patients.2 Procedures are assigned to 1 of 5 categories on the basis of a similar risk of in-hospital death. Category 1 has the lowest risk of death, and category 5 has the highest risk of death. In addition, each procedure receives a numeric score ranging from 0.1 to 5.0 that expresses mortality risk on a more continuous scale. The STS-EACTS Mortality Categories are intended to facilitate analysis of outcomes by grouping procedures with similar risk of in-hospital mortality.
Although congenital heart surgery outcomes analyses have traditionally focused on mortality, comprehensive assessment requires attention to other end points. Nonfatal events, such as stroke and renal failure, are major determinants of hospital cost and patients’ health status after surgery. In addition, postprocedure length of hospital stay provides useful direct information about resource use and indirect proxy information about a patient’s condition.3,4 Although such measures are captured in clinical registries, tools for analyzing these end points are lacking.
The goal of the present study was to develop a new system for classifying congenital heart surgery procedures on the basis of their potential for morbidity using empirical data from the STS Congenital Heart Surgery Database (STSCHSD). There were 4 specific objectives:
The Morbidity metric was developed primarily for the purpose of grouping types of procedures to better describe case mix, as was the STS-EACTS Mortality metric. The intent was not to assess or predict outcomes for an individual patient or surgeon, for which other types of analyses may be used.
The STSCHSD has been described.5 The Duke Clinical Research Institute serves as the data analysis center for STS databases and has an agreement and institutional review board approval to analyze the aggregate deidentified data for research purposes. For this study, operations were included if they took place between January 1, 2002, and December 31, 2008, and were 1 of the 148 types of cardiovascular procedures for which the STS-EACTS Mortality Score is defined.2 Operations performed at centers with no more than 10% missing data for complications, mortality, or postoperative length of stay (PLOS) were eligible for inclusion in the analysis. From eligible centers, individual operations with missing data for complications, mortality, or PLOS were excluded. Of 63,297 potentially eligible operations, 446 individual operations were excluded on the basis of missing data regarding complications (n = 273), PLOS (n = 151), or mortality (n = 22).
Additional inclusion and exclusion criteria were identical to those used for developing the STS-EACTS Mortality Score.2 Only the first operation of each hospital admission was analyzed. The final study population consisted of 62,851 operations classified into 148 procedure types at 68 centers. Results are presented for the subset of 140 procedure types having at least 10 eligible cases (62,819 total operations; 99.9%).
Several operations in the analysis represent combinations of 2 or more procedures. These are analyzed as combined procedures because the complexity of the combination is regarded as being different from the complexity of the component procedures when performed in isolation. For each of these combined procedures, unique procedure codes were subsequently assigned in STSCHSD version 3.0. Because all data in this analysis predate version 3.0, classification of multiple-procedure operations in this study follows guidelines set forth previously in development of the STS-EACTS Mortality Score and Categories.2
Morbidity was quantified for each procedure on the basis of the proportion of patients experiencing major complications and by the average PLOS (Table 1). Major complication was defined as the occurrence of any 1 or more of the 6 complications listed in Table 2. These complications represent definitive outcomes that can be ascertained reliably and that are likely to have significant and durable impact on patient health. The unadjusted rate of major complications is defined as the percent of operations that were associated with the occurrence of 1 or more of the major complications listed in Table 2. PLOS was defined as the number of days from the date of operation to the date of discharge and was determined for all patients, including those who died in-hospital.
Statistics calculated for each procedure type included the number of eligible operations, the percent of patients experiencing major complications, the 95% binomial confidence interval for the probability of major complications, and the average and interquartile range (25th and 75th percentiles) of PLOS. Model-based estimates of each procedure’s average risk of major complications and average PLOS were calculated by hierarchical modeling and presented along with 95% Bayesian credible intervals (CrIs). Details of these calculations are provided in Appendix 1.
To facilitate ranking and grouping of procedures, average risk of major complications and average PLOS were combined into a single composite morbidity measure. To account for different measurement scales, the 2 individual measures were rescaled to have the same standard deviation (Appendix 3). They were then summed together. The resulting composite morbidity measure was the basis of the proposed Morbidity Scores and Categories. Each procedure was assigned a numeric score ranging from 0.1 to 5.0 (STS Congenital Heart Surgery Morbidity Score). The range was chosen to be the same as the existing STS-EACTS Mortality Score. Scores were assigned by shifting and rescaling the procedure-specific composite morbidity estimates to lie in the interval from 0.1 to 5.0 and then rounding to 1 decimal place.
Procedures were sorted by increasing estimated morbidity and partitioned into 5 relatively homogeneous categories (STS Congenital Heart Surgery Morbidity Categories). This number of categories was chosen to match the number of STS-EACTS Mortality Categories.2 A computer program was used to search for category cutpoints that were optimal for minimizing within-category variance and maximizing between-category variance of the composite morbidity measure. The relationship between the number of categories and the degree of within-category homogeneity was assessed (Figure E1).
Sensitivity analyses were performed to assess whether the ranking of procedures depended heavily on the choice of statistical methodology. The Spearman rank correlation coefficient was used to quantify the extent to which rankings differed. Differences were also assessed graphically by plotting estimates of the same quantity calculated by 2 different statistical methods.
Finally, we estimated the statistical precision (reliability) of the estimated rates of major complication, average PLOS, and composite morbidity. Reliability of a set of estimates is conventionally defined as the proportion of between-unit variation that is explained by true between-unit differences (ie, signal) as opposed to random statistical fluctuations (ie, noise). A mathematically equivalent definition is the squared correlation between a measurement and the true value. In our case, reliability was defined as the squared Pearson correlation between each procedure’s estimated and true amount of morbidity. Reliability could not be calculated directly (because the “true” morbidity values are unknown) but was estimated by hierarchical modeling, as described in Appendix 1.
Sample sizes per procedure ranged from 1 to 4868. The 140 procedures with at least 10 cases are listed in Table 1 along with their sample sizes, raw and model-based morbidity estimates, and Morbidity Scores and Categories.
Model-estimated risk of major complications ranged from 1.0% for atrial septal defect repair to 38.2% for truncus arteriosus with interrupted aortic arch repair. Procedure-specific estimates of average PLOS ranged from 2.9 days for implantable cardioverter defibrillator procedure to 34.6 days for a stage 1 Norwood procedure and 42.6 days for a combined atrial switch and Rastelli procedure for congenitally corrected transposition. The Spearman rank correlation between raw rates of major complication and average PLOS was 0.63 (Figure 1) in procedures with at least 10 cases and 0.82 in procedures with at least 200 cases. This degree of correlation suggests that complication rates and PLOS provide related, but not redundant, information about morbidity.
Procedure-specific overall morbidity was defined as 0.141 × percentage rate of major complications + 0.162 × average PLOS in days. The numbers 0.141 and 0.162 were calculated as the reciprocals of the standard deviations of the percentage rate of major complications and average PLOS, respectively. The STS Morbidity Score was obtained by rescaling this overall morbidity measure to lie in the interval 0.1 to 5.0. Thus, by design it ranged from 0.1 to 5.0. Procedures with the least morbidity (STS Morbidity Score = 0.1) include atrial septal defect repair and implantable cardioverter defibrillator procedures. The procedure with the greatest morbidity (STS Morbidity Score = 5.0) was repair of truncus arteriosus with interrupted aortic arch.
STS Morbidity Categories were obtained by grouping procedures into 5 unequally sized categories (1 = least bidity, 5 = most morbidity) chosen to be maximally homogeneous with respect to overall morbidity. The number of procedures assigned to categories 1, 2, 3, 4, and 5 were 36, 43, 36, 21, and 4, respectively. The rate of major complication ranged from 3.2% in category 1 to 30.0% in category 5. The aggregate average PLOS ranged across categories from 6.3 days in category 1 to 34.0 days in category 5.
Several analyses were performed to address potential methodological concerns with the composite measures used in this analysis. First, we addressed potential issues related to “Major Complication,” which is a composite designating the occurrence of any 1 or more of 6 individual complications. The observed rate of discharge mortality for patients who experienced at least 1 major complication was 23.5% (Table 2) in comparison with 2.0% among patients who experienced none of the major complications. When end points in a composite occur with differing frequencies, the more frequent end points may sometimes dominate.6 As shown in Table 2, the aggregate rate of major complications ranged from 0.8% for “postoperative neurologic deficit persisting at discharge” to 4.7% for “unplanned reoperation.” To verify that each individual complication contributed statistical information but did not dominate the composite, we calculated the Spearman rank correlation coefficient between procedure-specific rates of each individual complication and rates of any major complication. These correlations ranged from 0.37 for heart block to 0.79 for unplanned reoperation. Thus, although unplanned reoperation explained much of the variation in the major complication end point, no single item dominated. All 6 complications contributed statistical information.
Second, we assessed the impact of modifying the list of major complications to include mortality. Although mortality was ultimately excluded, we thought it was important to know whether results would be similar or different had mortality been included. To address this, we calculated 2 versions of the major complication end point (1 including and 1 excluding mortality) and compared them. As shown in Figure 2, the 2 major complication end points were highly correlated but not perfectly related. The rank correlation coefficient between them was 0.97.
Third, although morbidity was calculated as an equally weighted combination of complication rate + average PLOS, strong consideration was given to an þalternative composite consisting of the rate of major complications and the average time on ventilator. Rank correlation between these 2 composite morbidity measures was 0.93, suggesting that the 2 methods tend to give similar, but not completely identical, results. The version using PLOS was preferred in part because PLOS was collected with high (>99.9%) completeness, whereas ventilation time was more than 15% missing. Moreover, during the time period of this study, the STS definition of time on ventilator only included the time until the first extubation and did not include the additional time on ventilator for patients who were subsequently reintubated.
Fourth, we assessed the reliability (ie, statistical precision; see “Materials and Methods”) of the various measures that were used for ranking procedures in this study. For major complications, average PLOS, and the composite morbidity measure, the estimated reliability values were 0.80 (95% CrI, 0.71-0.87), 0.88 (95% CrI, 0.82-0.92), and 0.90 (95% CrI, 0.85-0.94), respectively. Thus, reliability was greatest for composite morbidity, which was the basis of the proposed Morbidity Score and Categories. The estimated reliability of composite morbidity increased to 0.95 (95% CrI, 0.92-0.97) when considering only procedures with at least 30 cases (N = 115 procedures) and to 0.99 (95% CrI, 0.98-0.99) when considering only procedures with at least 200 cases (N = 67 procedures).
Finally, we assessed the degree of association between the proposed Morbidity Score and the existing STS-EACTS Mortality Score. A weak association would suggest poor content validity because, conceptually, we know that morbidity and mortality are closely related. On the other hand, a perfect association would suggest that the morbidity score is redundant with mortality and thus is unneeded. To address these issues, the proposed Morbidity Score and the STS-EACTS Mortality Score were plotted and compared. As shown in Figure 3, they are closely related (rank correlation = 0.79), but far from being redundant.
Descriptive characteristics of the 5 Morbidity categories are shown in Table 3. The association between Morbidity Categories and Mortality Categories is summarized in Table 4. The Morbidity and Mortality Categories were identical for 74 procedures, differed by 1 or fewer positions for 135 procedures, and differed by 2 or fewer positions for 139 procedures. One procedure (pulmonary artery debanding) was in category 4 for mortality but category 1 for morbidity.
Measuring morbidity is a challenging but important element of outcomes reporting and quality assessment.7,8 Morbidity is a major determinant of health status after surgery and of hospital cost.3,4,8 The importance of developing a morbidity metric was articulated in 2004 by Phillipe Kolh,9 who described quantitation of morbidity in cardiac surgery as follows: “Being more frequent than mortality, it could carry more information and be measured in terms of postoperative complications and length of hospital stay…. Furthermore, because of the heterogeneity of morbidity events, future scoring systems should probably generate separate predictions for mortality and major morbidity events.” In this report, we introduce an empirically derived tool that estimates the relative risk of morbidity associated with congenital heart surgery procedures on the basis of elements of both complications and PLOS.
Formal risk modeling using logistic regression is practical for common “adult cardiac procedures,” such as coronary artery bypass grafting and valve replacement. No operation for congenital heart disease is performed in numbers comparable to coronary artery bypass grafting. The diverse spectrum of distinct procedures is reflected by 140 procedure types in this study. Bayesian modeling is a particularly appropriate tool in this setting where denominators may be small. Thus, the product of this analysis is not a series of procedure-specific risk models, but rather a metric of procedure-based estimates of morbidity that can be used to describe case mix.
At the outset, we appreciated the importance of including both a complications element and a resource utilization element in a morbidity metric. We felt obligated, however, to demonstrate that use of either alone would be inadequate, that is, a model that assumed a direct 1-to-1 relationship between major complications and PLOS would not fit the data as well, and therefore be an incomplete and less informative morbidity metric.
Resource utilization variables used in previous analyses were considered. Analysis based on inclusion of ventilation time is described earlier in this article. Previous analyses from individual institutions have included length of intensive care unit (ICU) stay.10,11 It is less useful at a multi-institutional level because of lack of a uniform definition of ICU, and because some institutions keep postoperative patients in an ICU environment until discharge. Cost is another measure of resource utilization that may be associated with morbidity. Cost data are not included in STS registries; furthermore, true cost data can be difficult to estimate.
Individual elements of the complication end point were considered on the basis of their potential impact on patients’ health status, including durable, long-lasting effects. We acknowledge that validated data describing relationships between some individual complications and late health status are not readily available. Some complications that are not included, such as Postoperative Cardiac Arrest, have been evaluated in other studies and shown to potentially be associated with mortality.12 Despite the fact that all complication codes in the STSCHSD have corresponding definitions since 2006, the coding of complications such as Postoperative Cardiac Arrest may still be subject to a degree of interpretation, and thus potentially variable ascertainment, in contrast to complications included in our list, such as postoperative mechanical circulatory support. Still other complications, such as sternal dehiscence and mediastinitis, are not included in this metric because they result in unplanned reoperations, and thus are accounted for in the composite. A variety of other complications that are not counted in the major complications end point are likely to be reflected in increased PLOS. Although some have argued that mortality is the “ultimate morbidity,” the decision to exclude in-hospital mortality from the morbidity metric was deliberate, based on several principles. First, the Morbidity Scores and Categories are designed to be used in conjunction with the STS-EACTS Mortality Scores and Categories. Second, analyses of potential associations between morbidity and mortality require the use of separate metrics for each.9,13
Our analysis confirms that morbidity and mortality indices provide related but not redundant information. Outcomes assessment should include measures of both, as suggested by Kohl.9 The decision to include patients who died before discharge may seem obvious, but it was debated. An alternative strategy that eliminates from analysis those who died before discharge would have resulted in an incomplete and potentially misleading picture of morbidity, and would have compromised any possible efforts to explore relationships between morbidity and mortality. The concept of “Failure to Rescue,” that is, probability of death following the occurrence of a complication or adverse event, is emerging as a potentially important tool for measuring performance and directing quality improvement initiatives.14,15 Quantitative estimation of morbidity only among hospital survivors would overlook the importance of this concept.
Widely used systems for stratifying risk or complexity of congenital heart surgery procedures have focused entirely on in-hospital mortality16 or included a morbidity element that was based on expert opinion rather than objective data.17 The Morbidity Score and Categories presented in the current study are complementary to the empirically based STS-EACTS Mortality Score and Categories2 and were derived using data in the largest congenital heart surgery registry.
Despite the advantages of an empirically based tool for analyzing morbidity, this study has important limitations. The analysis focuses on estimation of morbidity at the procedure level. We did not address methods of incorporating these procedural variables into statistical models for performing inter-institutional outcomes comparisons. Nor does our methodology address the appropriateness and timing of individual procedures in relation to overall disease management or include consideration of patient-specific risk factors. Second, despite a large database size, it is possible that patients and data in the STSCHSD are not entirely representative of other populations. Third, several individual procedures had small sample sizes, and the true morbidity associated with these procedures may have been estimated with error. We attempted to minimize this error by using a composite measure that combines statistical information from several related end points into a single end point. Fourth, both the occurrence and the impact of morbidity extend beyond the duration of the surgical hospital admission. The nature of the STSCHSD precludes inclusion of complications recognized or therapeutic interventions occurring after discharge from the “surgical admission.”18 A “long-term database” will ultimately be needed to achieve a more comprehensive estimation of morbidity associated with surgery for congenital heart disease. This study represents an important first step and acknowledges the need for a quantitative morbidity metric and the mandate that it be empirically derived.
The STS Morbidity Score and Categories is a tool for analyzing morbidity associated with operations for congenital heart disease and for grouping procedures with similar empirically estimated risk of morbidity. Together with the STS-EACTS Mortality Score and Categories, this tool enhances our ability to accurately characterize case mix. It should add a new dimension and precision to outcome assessments and may provide important information to guide quality-improvement initiatives.
A bivariate hierarchical model with normally distributed random effects was used to estimate the distribution of procedure-specific probabilities of major complication and average PLOS. For the i-th patient undergoing the j-th procedure, let yji denote the occurrence of major complication (0 = no, 1 = yes) and xji denote the patient’s PLOS. The model was as follows:
where πj denotes the unknown theoretic probability of major complication for the j-th procedure; μj and denote the unknown mean and variance of PLOS for the j-th procedure; and μ, Σ denote unknown parameters of the assumed bivariate normal random effects distribution.
Model parameters were estimated in a Bayesian statistical framework by specifying a prior probability distribution for unknown parameters μ, Σ, and . Because our prior knowledge was limited, we specified a vague proper prior distribution that consisted of independent normal distributions for the elements of μ, independent inverse Gamma distributions for the s, and an inverse Wishart distribution for Σ. Posterior means and CrIs were calculated using Markov Chain Monte Carlo (MCMC) simulations as implemented in WinBUGS version 1.4 software (Medical Research Council Biostatistics Unit, Cambridge, UK, and the Imperial College School of Medicine at St Mary’s, London, UK).19 Posterior summaries were based on 4000 sets of simulated parameter values generated after a long burn-in period to ensure convergence.
The overall composite morbidity of the j-th procedure was defined as follows:
The parameter θj was estimated as where denotes the simulated value of θj at the l-th iteration of the MCMC procedure. A 95% Bayesian CrI for θj was obtained by calculating the 100th lowest and 100th highest values of θ(l) across the 4000 simulated values.
To create internally homogeneous categories, procedures were first sorted in order of increasing estimated morbidity and then grouped by choosing cutpoints that were optimal according to a least squares variance criterion, as described later. We first sorted procedures so that . Let K denote the number of categories and let cK = c1<c2<<cK−1 denote a set of category cutpoints that partition the categories into K groups. The symbol cK denotes a number between 1 and 148 and represents the index of the highest-morbidity procedure in the k-th category. Also, define c0 = 0 and cK = 149. For any particular choice of K and cK, within-category homogeneity was measured by the weighted sum-of-squares criterion:
where nj is the number of patients in the denominator for the j-th procedure and is the average morbidity of all procedures in the k-th category weighted by their respective sample sizes. If the θj were known instead of unknown, then the “optimal” cutpoints could (in theory) be determined by enumerating all possible choices for the ck and choosing the one that minimizes the WSS. Because the θj are unknown, we instead chose cutpoints that minimize the estimated value of WSS (cK). Specifically, we chose cut-points that minimize the posterior mean
where θ(l) is the value of θ = (θ1, …, θ148) on the l-th iteration of the MCMC procedure. An unpublished dynamic programming algorithm was used to determine the set of cutpoints that made this quantity a minimum. The WSS criterion gets smaller as K, the number of categories, increases. The value K = 5 was selected for consistency with the published STS-EACTS mortality categories.
Reliability is conventionally defined as the proportion of variation in a measure that is due to true between-unit differences (ie, signal) as opposed to random statistical fluctuations (ie, noise). Equivalently, it is the squared correlation between a measurement and the true value. Accordingly, reliability was defined as the square of the Pearson correlation coefficient between the set of procedure-specific estimates and the corresponding unknown true values θ1, …, θ148, that is:
The quantity ρ2 was estimated by its posterior mean, namely
with denoting the value of θj at the l-th MCMC iteration and denoting the posterior mean of θj. A 95% CrI for ρ2 was obtained by calculating the 100th smallest and 100th largest values of across the 4000 MCMC iterations. Analogous calculations were used to estimate reliability for subsets of procedures with at least 10, 30, or 200 cases. An identical approach was also used for estimating the reliability of procedure-specific probability parameters π1, …, π148 and mean parameters μ, …, μ148.
Several procedures listed in Table 1 are actually combinations of 2 or more procedures. These combinations were previously identified during development of the STS-EACTS Mortality Score and Categories.2 They occur frequently in the STS and EACTS databases, and the complexity of the combination is regarded as being different from the complexity of the component procedures when performed in isolation. For all other operations involving combinations of procedures, the operation was classified according to the most technically complex procedure, as determined by the difficulty component of the 2007 update of the Aristotle Basic Complexity score. The Aristotle Basic Complexity score contains some ties and is not defined for 3 of the procedures listed in Table 1. To deal with undefined or tied Aristotle scores, 6 of the study authors independently ranked the difficulty of each procedure. Undefined or tied Aristotle scores were adjudicated by assigning the operation to the procedure with the highest average ranking determined by the 6 graders. The difficulty rankings were published together with the STS-EACTS Mortality Score and Categories.2 The identical methodology and rankings were used to classify multiple-procedure operations during development of the STS Morbidity Score and categories.
Procedure-specific complication rates are percentages measured on a scale from 0 to 100. Procedure-specific average PLOS is measured in days ranging from 0 to infinity. These are different measurement scales. We rescaled these so that the new scales would have approximately the same standard deviation. This guarantees that approximately half of the variation of the composite measure will be attributable to complications and half to PLOS. If we did not rescale them, then the amount of variation contributed by each item would be dependent on the scale we used for measuring it. For example, we would get different results depending on whether complication rates were expressed as percentages or proportions, or whether PLOS was measured in days, weeks, or months. Rescaling makes it possible to have a composite measure in which dominance of a single element is avoided.
For complications, first, we calculated each procedure’s complication rate. Next, we calculated the standard deviation of the set of procedure-specific complication rates. Finally, to obtain a rescaled complication rate, we divided each of the original 140 complication rates by their common standard deviation. The same process was used for standardizing average PLOS.
Supplemental material is available online.
Disclosures: Authors have nothing to disclose with regard to commercial support.