No RCT was identified that compared TKR to an alternative treatment (e.g., drug therapy, physical therapy, arthroscopy, or debridement). It is unlikely that there ever will be such an RCT, because TKR is a commonly accepted procedure, and ethically it would be difficult to randomize patients to not receive TKR. Accordingly, the highest quality evidence available to assess the effectiveness of TKR is that given by observational studies. lists the studies that met the inclusion criteria for this review.
Effectiveness of Total Knee Replacement
No studies were found since the ARHQ review was published that has compared TKR to an alternative treatment. Several studies have been reported that compared preoperative measurement scores (WOMAC, KSCRS, HSSK, SF-36) to postoperative measurement scores in patients undergoing various TKR procedures. The objective in these studies was not to report the effectiveness of TKR; nonetheless, they were included because they did report a measure of effectiveness.
lists a comparison of the quality of these studies. All used a standardized outcome measure to compare preoperative and postoperative outcomes. Most reported specific inclusion and exclusion criteria. The most infrequently reported item was a justification of the sample size recruited. It is important to note that these studies vary from observational studies to RCTs, and the studies that did not report a justification for the sample size were not limited to one type of study design. For length of follow-up, less than 1 year was considered inadequate. To measure overall quality, 0 to 2 checkmarks were considered poor quality, 3 to 4 were considered moderate quality, and 5 were considered high quality. Most of the studies were moderate quality. Only 2 were high quality (i.e., addressing all quality indicators), both of which were RCTs. Overall, the quality of the 19 studies is moderate.
Quality of Evidence for Preoperative and Postoperative Studies
Various demographic characteristics were reported within each of the studies; however, the data reported was inconsistent across the studies. To try to compare outcomes across these studies, effect sizes were calculated. Effect size was defined according to the number of standard deviations of change.
on the next page shows the characteristics of the studies identified. In this table, the studies have been stratified according to length of follow-up (less than 6 months, 1 year, 2–5 years, and more than 5 years). For the studies that reported weight, some reported weight in kilograms, others reported BMI alone, and others reported weight and BMI. The most heterogeneous measure across the studies was the type of measurement score used (WOMAC, KSCRS, HSSK, SF-36). The overall Knee Society Clinical Rating System was the most frequently reported outcome. At the bottom of , the pooled means are reported for age, proportion of females in the studies, proportion of patients with osteoarthritis, and pooled mean weight and BMI.
Characteristics of Studies Reporting Preoperative and Postoperative Standardized Scores
Mean Effect Size
Eight studies (22
) reported preoperative and postoperative pain scores for pain. Interestingly, the studies that reported pain measurements also reported the shortest follow-up times compared with the studies that reported function or overall measurements. The KSCRS, WOMAC, SF-36, Oxford hip and knee scale, and the New Jersey Osteoarthritis Hospital (NJOH) score were reported in the 8 studies. Only 3 studies (22
) reported standard deviations around the mean preoperative and postoperative scores, which means that effect sizes could only be calculated for these studies. Even though there were a variety of different pain measurements reported, the effect size was greater than 1.0 in all 3 studies, indicating a large decrease in pain after total knee surgery. In , the mean effect sizes for the studies are displayed. highlights 0.8 because any effect size greater than 0.8 indicates there is a large effect between the preoperative and postoperative scores.
Function was a more frequently reported outcome than pain in the studies that reported preoperative and postoperative scores for TKR. Thirteen studies (21
) reported function as an outcome measure. Various measures of function were reported, including the WOMAC, KSCRS, SF-36, and NJOH score (). For the 7 studies (22
) that included standard deviations for the preoperative and postoperative function scores, the mean effect size for function ranged from 0.47 to 3.86. () The highest and lowest mean effect sizes were reported using the KSCRS as the measurement scale. Six of the 7 studies measuring function reported effect sizes greater than 0.8, which indicates a large effect. The study that reported an effect size less than 0.8 was done by Joshi et al. (28
) and investigated TKR in patients aged older than 80 years using the KSCRS. Their study had an effect size of 0.47 for function; however, the overall effect size was 2.69. Unfortunately, Joshi et al. did not report pain outcomes separately, but the large overall effect size and the smaller function effect size would suggest that there was a substantial improvement in pain.
Mean Effect Sizes for Preoperative and Postoperative Standardized Scores
Thirteen studies (21
) were identified that reported preoperative and postoperative overall measurement scores (KSCRS, WOMAC, HSSK, Oxford hip and knee score). Of the 7 studies that reported standard deviations for the preoperative and postoperative scores, all of the studies reported mean effect sizes much larger than 0.8. The range of mean effect size was between 1.94 and 4.66 ().
It is important to interpret these findings cautiously, because none of these studies were designed to measure the effectiveness of TKR specifically.
Revision Rates and Failure of Prosthesis
Nine of the 19 studies that reported preoperative and postoperative outcome scores also reported revision rates. outlines the details of those revisions. The rates of revision ranged between 0% and 13% in studies that reported more than 5 years of follow-up data. The data shown in are for revisions only. Patients may experience complications related to their knee replacement, but not undergo revision surgery to alleviate the complication.
Rates of Revisions in the Studies That Reported Preoperative and Postoperative Outcome Scores
The OJRR has only recently started collecting data (since 2001); therefore, it does not have long-term data to assess the revision rates or the duration of effect of the prostheses. The Swedish Knee Arthroplasty Register is the most established knee replacement registry worldwide. It has been collecting data on knee replacements since the mid 1970s (http://www.ort.lu.se/knee/indexeng.html
). They report that there has been an improvement in the rates of revisions over time, because the TKR surgeries that were done between 1978 and 1983 were associated with the highest rates of revision, regardless of time since surgery. That is, even at 4 years after surgery, the Swedish Registry reported that the rates of revision were higher for the TKR procedures between 1978 and 1983 than for any other period. (40
Factors Related to Outcomes for Total Knee Replacement
Six studies (27
) were identified that investigated factors related to outcomes for TKR using regression analyses. There was variation in the variables included in the analyses. The generally accepted rule regarding the sample size for a regression analysis is 10 subjects per variable. For instance, a regression analysis incorporating 10 variables should have a minimum sample size of 100. Two studies (43
) did not meet this requirement.
on the next page shows the features of the analyses. The proportion of variance explained by the models was low. Parent et al. (43
) reported that they were able to explain 66% of the variance; however, they had 18 variables in their model and only 65 patients. The other models were able to predict between 12% and 28% of the variance with their models, which is very low. All of the regression analyses investigated factors that predicted function, and 2 also investigated factors that predicted pain. Both of these studies used the WOMAC pain score as their independent variable. Three of the analyses used the WOMAC function score to predict function, 1 used extension and flexion, and another used locomotor ability.
Multivariate Analyses of Factors To Predict Pain and Function in Patients Undergoing Total Knee Replacement
The variables that were included in the studies are listed in . The grey boxes indicate that the variable was not analyzed in the study, the boxes with Xs indicate that the variable was included in the study but was not a predictor of the outcome, and the boxes with checkmarks indicate that the variable was included in the study and was a significant predictor of outcome (and thus included in the final model). The 2 studies (43
) that did not include a sufficiently large sample size in their regression analyses were excluded from the table.
Multivariate Analyses Identifying Predictors of Function and Pain After Total Knee Replacement
Overall, the regression analyses accounted for less than one-third of the variance. This suggests that there are variables that predict pain and function outcomes after TKR that the analyses missed.
Obesity as a Predictor of Outcome After Total Knee Replacement
Five studies (46
) were identified that investigated obesity as a predictor of adverse outcomes after TKR. outlines the features of these studies. BMI was used in the studies to define obesity.
Characteristics of Studies Investigating Obesity as a Predictor of Adverse Outcomes After Total Knee Replacement
Stickles et al. (50
) compared outcomes of patients undergoing TKR according to their BMI, which was divided into 5 categories: less than 25, 25 to 30, 30 to 35, 35 to 40, and more than 40. The study did not report any significant differences in outcomes according to BMI.
The case-control study by Spicer et al. (49
) compared patients undergoing TKR who had BMIs over 30 to a control group of patients undergoing TKR who had BMIs less than or equal to 30. After a mean follow-up of 75.9 months (range, 48–144 months), the differences between the groups on changes in KSCRS scores were not statistically significant. The patients with a BMI over 30 had a revision rate of 4.9% compared to 3.1% in the patients with a BMI less than or equal to 30 (P
= .25). Spicer et al. also found that the 10-year survival rate was not statistically significantly different between the groups.
Similar to the studies by Spicer et al. (49
) and Stickles et al., (50) the study by Deshmukh et al. (48
) found that body weight did not adversely affect the outcome of TKR. They conducted a regression analysis to model BMI using the following variables: age, sex, extent of arthritis, comorbidities, and severity of disease based on baseline KSCRS and Nottingham Health Profile scores. They found that none of the variables dominated the model to explain the variance.
Two separate studies by Foran et al. (46
) reported that obesity did adversely affect the outcome of TKR. Both were case-control studies, comparing the outcomes of obese patients (BMI > 30) to outcomes of not obese patients (BMI < 30) undergoing TKR. compares the characteristics of these 2 studies. Unfortunately, Foran et al. did not report power calculations in either study. In the Foran et al. (47
) study with 68 patients in each group, the mean postoperative KSCRS score in the obese group was 90 compared to 94 in the non-obese group. Foran et al. indicated that there was a significant difference between these scores (P
= .04); however, this difference was not clinically significant, because a KSCRS score of more than 85 is considered excellent.
Comparison of Foran et al.’s Studies Measuring Obesity as a Predictor of Adverse Outcomes After Total Knee Replacement
The other study by Foran et al. (46
) had only 27 patients in each group and reported inconsistencies in the number of failures in each group. They reported 3 failures in the non-obese group and 10 failures in the obese group; however, they reported more revisions in the non-obese group (18 revisions) than in the obese group (10 revisions). The data used for this study were from 1982 to 1986. There have been substantial improvements in knee replacement technologies and procedures in the past 20 years, which makes the results of this study less relevant.
Thus, 5 studies were identified that specifically examined the effect of weight on the outcome of TKR, 3 studies, each with well over 100 patients and follow-up periods greater than 1 year, reported that weight did not affect the outcome of TKR. Two studies, by the same group of authors, found that weight was a significant predictor of TKR outcomes; however, both studies had several limitations.
Timing of Total Knee Replacement Surgery and Outcomes
Three studies (51
) (2 non-RCTs with contemporaneous controls and 1 retrospective review) were identified that investigated the role of degree of severity on the outcomes of TKR. The original comparative study was published in 1999, (52) and an update of the study was published in 2002. (54
Firstly, it is important to recognize that all 3 studies defined severity of osteoarthritis differently. The study by Gidwani et al. (51
) defined severity according to Ahlback’s grading scale for osteoarthritis. The review by Meding et al. (53
) used radiological images. The study by Fortin et al. (52
) used the WOMAC function score to separate patients into high and low functioning groups (divided at the median score). Thus, it is difficult to compare the outcomes of the 3 studies because they used different criteria to assess severity.
Gidwani et al. (51
) tested the hypothesis that TKR in “patients with relatively early stages of osteoarthritis will lead to a poor outcome.” This prospective case series included 130 patients with osteoarthritis undergoing TKR. X-rays were used to determine the severity of osteoarthritis before surgery using the Ahlback Radiologic Classification of Osteoarthritis of the Knee scale. The scale ranges from 0 to 5, where 0 is a healthy knee, and 5 indicates “gross bone loss and subluxation.” Most knees were rated 2 or 3 (80% of patients). The knees were divided into 2 categories based on severity. All knees with scores of 0 to 2 were placed in group A, and the knees that scored 3 to 5 were placed in group B.
Oxford Knee Scores were taken preoperatively and postoperatively in all patients. It is important to note that the median preoperative scores were similar for patients with “severe” disease and those with “less severe” disease. The median preoperative Oxford score for group A was 38; for group B, it was 33. At 1 year the median postoperative Oxford score for group A was 74; for group B, it was 75.
Gidwani et al. also reported the mean change in Oxford scores in 1-year postoperative scores compared with preoperative scores for knees based on Ahlback grades. Knees categorized as grade 1 improved 40%; grade 2, 34%; grade 3, 38%; and grade 4, 32%. It is important to recall that most of the knees were categorized as grade 2 or 3, and only 6 knees were included in the grade 1 subgroup. Nonetheless, patients improved substantially regardless of their initial severity grade. Gidwani et al. proposed that the results of their study supported the conclusion that TKR could be effectively performed earlier in patients with osteoarthritis, thereby preventing some of the pain and discomfort associated with more severe grades of osteoarthritis.
Meding et al.’s retrospective study (53
) of 1888 patients comprising 2759 consecutive TKR surgeries reported similar findings to the study by Gidwani and colleagues. As mentioned previously, Meding et al. determined severity based on radiological images. If there was evidence of bone touching bone in 1 or more compartments of the knee, then the patients was categorized as “severe;” otherwise, the patient was categorized as mild. Meding et al. reported that there were no significant correlations between the severity of osteoarthritis (mild versus severe) and Knee Society overall knee scores, function scores, or pain scores at 1 year, 3 years, 5 years, or 7 years postoperatively.
Fortin et al.’s study (52
) included 222 patients undergoing total joint replacement (n = 106 knee replacement patients). Fortin et al included patients with earlier intervention in Boston to later intervention in Montreal. Patients were divided into high or low function groups based on their WOMAC function scores (). There were more knees in the high function group, and there were more male patients in this group. Fortin et al. reported 2 measures for comorbid conditions. Both indicated that the groups were evenly matched for comorbid conditions. The Comorbidity Illness Rating Scale is more sensitive to disabling conditions than the Charlson Index, and the Charlson Index is more sensitive to life-threatening conditions. Patients were recruited from a hospital in Montreal and a hospital in Boston. Canadians had poorer functioning than did American.
Table 12: Comparing High and Low Function Groups in Fortin et al. (52)
Fortin et al. found that patients who had low functioning before surgery had bigger differences between their preoperative and postoperative function scores compared with those who had higher functioning preoperatively. Nonetheless, the patients with low functioning preoperatively did not improve as much as did patients with high functioning. The mean preoperative WOMAC function score was 44.2, and the mean postoperative WOMAC function score at 6 months was 23.0 for the low functioning group (in this study higher WOMAC scores mean poorer functioning), a difference of 21.2. For the high function group, the preoperative WOMAC function score was 24.3, and the mean postoperative score at 6 months was 9.5, a difference of 14.8. A similar trend was observed on the WOMAC pain scores as well, where the difference in pain was greater in the low functioning group; however, they did not have a large enough reduction in pain to “catch up” with the high functioning group. Fortin et al. only reported results up until 6 months postoperatively.
Subsequently, Fortin et al. (54
) reported results for the aforementioned study with 2 years of follow-up data. There were 165 (74%) patients who had had total hip or knee replacement surgery still in the study 2 years later. Of these, 81 (76%) patients remained who had had TKR. Fortin et al. reported that the 57 total hip and knee replacement patients that were missing had similar baseline characteristics as the remaining patients for age, sex, education, comorbid conditions, baseline function and pain scores, type of joint replacement, and centre where the surgery took place.
and chart the change in WOMAC scores for function and pain, respectively, from baseline to 6 months to 24 months postoperatively. Fortin et al. did not report any results of statistical comparisons between the low and high function groups. When the Medical Advisory Secretariat calculated the mean difference in the scores (t-tests), they found a significant difference between the scores at all time periods (baseline, 6 months, and 24 months) for pain and function (P < .001).
Figure 4: Change in WOMAC Function Score in Low and High Function Groups in Fortin et al. (52;54)
Figure 5: Change in WOMAC Pain Score in Low and High Function Groups in Fortin et al. (52;54)
This study indicates that the high functioning patients did better overall than did the low functioning patients in terms of pain relief and improved functioning. Patients allowed to deteriorate by waiting too long for their surgery improve after total hip or knee joint replacement surgery, but at two years post-operatively have significantly poorer pain relief and functional improvement than patients who receive surgery in a more timely manner.