The concept of MCIDs is a key issue in assessing patient-related outcomes (PROs) because of its relevance to clinical practice.67
With an elective procedure such as arthroplasty, this concept is particularly important. Clinical trials and cohort studies of HRQOL provide a change in population means (ie, pre- and postintervention scores) or compare the changes between different interventions (ie, answer the question, “How small a change in the outcome could this study detect?”). These population-level differences are commonly extrapolated to an individual level. However, patients are not interested in knowing population-level differences. Rather, they wish to know the likelihood that they will experience a meaningful improvement for the risk they take with an intervention (ie, “Is this change meaningful to me?”). This concept illustrates the difference between clinical and statistical significance (ie, an intervention may be both clinically and statistically significant or have only clinical or only statistical significance). MCID estimates are calculated using probabilistic arguments drawing on results from statistical theory or using so-called anchor-based measures—usually patient or physician global scales. Anchor-based methods pose a global question to the patient (or physician), asking about the overall improvement in pain (or function) experienced between two visits.
The next step is calculation of the amount of change on a pain scale (eg, visual analog scale, WOMAC pain scale) that corresponds to the minimal change on the global scale (usually the “somewhat better” response). Thus, the amount of change on a global scale serves as an anchor for calculation of MCID estimates. The “anchor” in defining the MCID is either an objective outcome or a patient response to a global question that is related to the outcome of interest. For example, Quintana et al68
used a five-point patient-based anchor in defining MCID in patients with TKA by asking the patients about the improvement in their knee 6 months after the intervention, with the possible responses “a great deal better,” “somewhat better,” “equal,” “somewhat worse,” and “a great deal worse.” Changes corresponding to “somewhat better” were used to establish the MCID for improvement.
MCID estimates have been described for the WOMAC and SF-36. For patients with primary THA, the MCID was 26 points for WOMAC stiffness and 29 points for WOMAC pain.55
For the same cohort, MCID estimates for the eight SF-36 sub-scales ranged from 11 points for the SF-36 physical role subscale to 20 points on the SF-36 physical function subscale. For primary TKA, the MCID estimates were 15 points for WOMAC stiffness and 23 points for WOMAC pain.54
For the same cohort, MCID estimates for the eight SF-36 subscales ranged from 12 points for the SF-36 physical function subscale to 17 points for the SF-36 bodily pain subscale.
Another advantage of calculating MCID estimates is that they can be used as a clinical trial outcome. TKA and THA are associated with large gains in HRQOL. However, when HRQOL outcomes are used to compare surgical approaches (eg, minimally invasive versus regular), specific surgeries (eg, patellar resurfacing versus no resurfacing, cruciate-retaining versus cruciate-sacrificing), or medical interventions in arthroplasty patients, the differences may have smaller effect sizes. In such instances, an MCID estimate is, therefore, a key characteristic for design of adequately powered studies.
For example, two studies that compared patellar resurfacing with no resurfacing reported no difference in KSS between the two groups, but neither study defined the MCID or was adequately powered to find some clinically meaningful difference.69,70
If these studies had collected some patient-reported global measure of improvement, then these data may have been used both to calculate MCIDs and for power calculations. Nonetheless, with 44 patients per treatment group, the study by Diduch et al70
had 80% power to detect a difference of 61% of a standard deviation (SD) (a moderately large and almost certainly clinically meaningful effect size); hence, the detectable effect from this design, although not specified, was likely of a reasonable size. If the sample size had been 64 patients per group, there would have been 80% power to detect the generally accepted clinically significant benchmark of 50% of the SD.10
The fact that no statistically significant result was observed is indicative that likely no clinically meaningful change was missed (although type II errors do occur).
One of the important issues regarding MCID calculation is that the results depend on the anchor used (ie, 4-, 5-, or 7-point scale)71
and, possibly, on patient expectations. The expected change correlating with “somewhat improved” may be different for a surgical versus medical intervention. For example, the MCID on the WOMAC in a TKA population was 23 for pain and 20 for function subscales.54
In a similar study of patients with OA undergoing nonsteroidal anti-inflammatory drug therapy, MCID estimates were 20 for WOMAC pain and 9 for WOMAC function.72
This example of different MCID estimates on WOMAC function scales may be attributable to differences in the patient population, baseline pain level, patient expectation, or the anchor used.
MCID estimates may be different in patients with revision versus primary arthroplasty for multiple reasons. Patients undergoing revision arthroplasty may have different pain severity and a higher likelihood of persistent pain and functional limitation postrevision. The largest improvements in HRQOL are seen in primary arthroplasty, and revision may be viewed more as a procedure to maintain the HRQOL, with less capability for symptomatic relief. Some revision surgeries (eg, intervening earlier in the course of osteolysis around an implant associated with prosthetic wear) may be performed not only to address symptoms but to prevent worse problems that may be more complex to manage in the future. In these cases, MCID or even HRQOL scores may not be the most relevant outcomes.
The use of MCID estimates facilitates the interpretation of normative data and baseline status in evaluating the health status of various populations of patients undergoing THA and TKA. There is a critical need to understand THA and TKA in regard to populations with differing aggregate health status. For example, a THA may be performed in a golfer who is having problems getting around the course, or in a patient from an underserved community who is on the verge of requiring a wheelchair. The baseline health status of these two persons may be quite different, and the outcome may be better in the golfer. The change in clinical status, however, may be greater in the borderline wheelchair patient; and the improvement in independence and mobility, as well as the cost savings for the health care system, may make this procedure more cost-effective. The risks of operating on the severely disabled patient may be greater, but so is the possible benefit.
A somewhat related concept is presenting the data regarding the proportion of patients who achieved previously described clinical end points (ie, responder analysis). For example, for the HHS, the definitions of excellent (90–100), good (80–89), fair (70–79), and poor (<70) outcomes have been described. In a study by Kim and Kim,65
the proportion of patients in these categories was reported. In a study by Diduch et al70
that followed 114 TKAs in young, active patients long-term, the authors reported that the mean KSS for function was 89 and that 94% of knees had good or excellent function. While recognizing the caveat that this particular categorization in HHS is somewhat arbitrary, we recommend that authors consider reporting the proportion of patients in the poor or fair category at baseline who shifted to a better or worse category at follow-up. In any event, responder analysis has the advantage of appearing similar to reports on other clinical variables such as treatment response or disease progression.
We recommend that more studies be done to derive MCID estimates for commonly used HRQOL scales in arthroplasty. With knowledge of MCID on these scales, the clinical significance of results for trials of surgical and nonsurgical interventions in arthroplasty patients can be interpreted in addition to simple statistical significance. At present, MCID estimates are known only for the WOMAC and SF-36, so studies comparing the proportion of patients achieving MCID with one treatment versus another may prefer these instruments.