|Home | About | Journals | Submit | Contact Us | Français|
To compare patients’ global assessments of change in knee, hip, and back symptoms with actual changes over time in pain, function, and radiographic severity.
Participants (n=894, 80% female, mean age=66 years) completed two assessments (mean of 4 years apart) as part of a study on the genetics of generalized osteoarthritis. At both assessments, participants completed the Western Ontario and McMaster Universities OA Index (WOMAC), and radiographic severity was assessed for knees, hips, and low back. At the second assessment, participants described changes in knee, hip, and low back symptoms as Worse, Better, Same, or Never Had Symptoms. Analysis of covariance models examined mean changes in WOMAC scores and radiographic severity according to categories of the global assessment measures. Statistical significance was examined for linear trend.
Mean WOMAC total, pain, and function scores decreased (indicating improvement) among participants who indicated joint symptoms were Better; showed little change among those who reported symptoms were the Same/Never Had Symptoms; and increased among those who reported symptoms were Worse. For all analyses except the comparison of WOMAC pain change according to global assessment of low back symptom change, there was a statistically significant linear trend (p<0.05). Patterns were similar for changes in radiographic severity, but the tests of linear trend were not statistically significant.
Results support the concordance of these global assessments of joint symptom change with actual changes in self-reported symptoms. These global assessments may be useful for assessing change over time when baseline data are unavailable.
Primary outcomes for most clinical trials of osteoarthritis (OA) therapies, as well as longitudinal observational studies, involve examination of changes in some multiple-item self-report questionnaire (such as the Western Ontario and McMaster Universities OA Index) (1). While these types of outcome assessments are well established and clearly important, there is increasing awareness of the value provided by patients’ global assessments of symptom change over time or in response to treatment. The Osteoarthritis Research Society International (OARSI), the Outcome Measures in Rheumatology Clinical Trials group (OMERACT), and others have specifically recommended including patient global assessment as a key outcome in OA trials (2–5).
Some studies of hip and knee OA have incorporated patient global assessments of change in response to treatment (6–8). However, there have been few studies examining the validity of global assessment of change measures, compared with other standard outcome measures used in OA research. Additional evaluation of such measures is warranted, since global assessments of change are useful in the context of not only clinical trials, but also in epidemiological studies and clinical decision making. Global assessments of change are particularly useful for assessing symptom change over time when no baseline data are available.
We previously reported on the construct validity of a global assessment of change measure for hand OA in a large study of the Genetics of Generalized OA (GOGO) (9). This study supported the validity of the global assessment of hand symptom change based on significant associations with changes in grip strength, pinch strength, and AUStralian CANadian Osteoarthritis Hand Index (AUSCAN) (10, 11) scores. The current study extends this work, examining the construct validity of global assessment of change items for hip, knee, and back symptoms. Specifically, we examined associations of these global assessments of change with changes in WOMAC scores and radiographic severity.
All participants were enrolled in the GOGO study (12). The primary objective of GOGO was to identify OA susceptibility genes through genotyping and linkage analysis of OA affected sibling pairs. GOGO involved a consortium of seven clinical research sites in the United States and United Kingdom (13). Participants included in these analyses were from three GOGO sites: Duke University and the University of North Carolina at Chapel Hill in the US, and the University of Nottingham in the UK. These three sites were included because they collected the global assessment of change questions, as well as WOMAC scores at baseline and follow-up.
The overall sample size for the GOGO study was based on providing adequate power to detect genes of moderate to large effect (12). All families recruited for the GOGO study had at least two siblings with bilateral hand osteoarthritis, defined as bony enlargement of at least one distal interphalangeal joint (DIP), and bony enlargement of 2 or more other interphalangeal joints or carpometacarpal (CMC) joints. All radiographs were read by a single reader (JBR). Participants were excluded if they had self-reported or x-ray evidence of rheumatoid arthritis, systemic lupus erythematosis, psoriasis, or gout of the hands, hips or knees. Among 1,569 GOGO participants who met criteria for these analyses at baseline, 924 (59%) completed questionnaires and at least one radiograph (knee, hip, or spine) at follow-up. The mean time between baseline and follow-up assessments was 4.0 years (standard deviation (SD) = 0.9). Those who did and did not participate in follow-up assessments did not differ significantly according to baseline WOMAC total, pain or function scores, body mass index (BMI), or gender. Those who did not participate in follow-up assessments were older than those who did participate (means = 68 years and 66 years, respectively, p<0.01). For these analyses we also excluded individuals who had had knee or hip replacement surgery between baseline and follow-up assessments and those who did not answer any of the global assessment of change questions, resulting in a final sample size of n = 894.
At the follow-up visit, all participants were asked (separately for each joint), “Since your last GOGO visit, has there been any change in pain, aching, or stiffness in your: LEFT KNEE; RIGHT KNEE; LEFT HIP; RIGHT HIP; or LOW BACK?” Four responses were possible: 1. Worse, 2. About the same, 3. Better, and 4. No symptoms at the first GOGO visit and still no symptoms. These questions were patterned after a global assessment of change scale for assessing shortness of breath (14).
Because our analyses involved comparing these global assessments of change with WOMAC scores, which are not joint specific, and because we did not find any substantial differences in analyses of right vs left knees or right vs. left hips, we created composite global assessment categories and scores that combined information for both knees and both hips. Specifically, we grouped people into the following categories: 1. One or Two Knees Better (No Knee Worse); 2. Both Knees Same or Never Had Symptoms; 3. One Knee Worse; 4. Two Knees Worse. The same categories were created for hips. For low back global assessments (a measure without laterality), we grouped people into three categories: 1. Better; 2. About the same or Never Had Symptoms; 3. Worse. In addition to these joint-specific categories, we created a variable indicating the total number of joint sites each participant reported as being Worse (0–5; summing 2 hips, 2 knees, and low back).
We examined associations of global assessment of change questions with changes in WOMAC scores (between baseline and follow-up GOGO visits). The WOMAC is a validated and reliable scale designed to assess pain, stiffness, and function in lower extremity OA (range: 0–96) (1, 15). In addition to examining the total WOMAC scores, we separately examined the WOMAC pain (range: 0–20), and physical function (range: 0–68) subscales. To examine changes in WOMAC scores, we subtracted baseline scores from follow-up scores, so that positive change scores indicated worsening pain and function and negative scores indicated less pain and better function.
Knees: A fixed-flexion posteroanterior knee radiograph was taken with the SynaFlexer™ x-ray positioning frame (Synarc, San Francisco, CA) (16) with 10° caudally angulated x-ray beam. Hips: An anteroposterior view of the pelvis was performed with the participant supine and feet internally rotated 10 degrees. Spine: A lateral view of the lumbar spine (L1–L5) was performed with the participant recumbent, with left side down. Knee, hip, and back radiographs were read by a single reader (JBR), with both baseline and follow-up radiographs being read at one time but with the reader blinded to time point. Participants were excluded from knee and radiographs in cases of joint replacement and amputation. Hip radiographs were performed in women of child-bearing potential after confirming a negative pregnancy test. Lumbar spine x-rays at both baseline and follow-up were performed at the US sites only (Duke and University of North Carolina).
We examined associations of global assessment of change questions with changes in radiographic severity between baseline and follow-up GOGO visits. For knees, radiographic severity was defined as the sum of scores for osteophytes (0=none to 3=severe) and joint space narrowing (JSN; 0=none to 3=severe) at the medial and lateral aspects of the tibia and femur, and at the medial and lateral patellofemoral joint; possible scores of 0–48 for two knees. For hips, radiographic severity was defined as the sum of scores for osteophytes (0=none to 3=severe) at the lateral and medial aspects of the femur and acetabulum, as well as medial, superior, and axial JSN (0=none to 3=severe), possible scores of 0–42 for two hips. Sums of osteophytes and JSN were significantly correlated with Kellgren Lawrence grades (17) for hips and knees (Pearson r’s = 0.76–0.92, p < 0.0001). Lumbar spine radiographs were scored for the presence and severity of vertebral osteophytes and disc narrowing (for analyses also referred to as JSN). Radiographic severity for the lower back was defined as the sum of osteophytes (0=none to 3=severe) and JSN (0=none to 3=severe) for each of the lumbar spine levels (L1–L5), for possible scores of 0–30. For all three joint groups we also separately examined associations of global assessment of change scores with changes in osteophyte and JSN scores.
We included the following characteristics in multivariable models: baseline participant age, gender, baseline BMI (calculated from measured height and weight), baseline radiographic severity (as defined above), and time between baseline and follow-up assessments.
We performed ANCOVA models to examine differences in mean changes in WOMAC total, pain subscale, and function subscale scores, as well as changes in radiographic severity, according to the combined global assessment of change categories for hips and knees (4 categories each for hips and knees: One or Two Knees/Hips Better; Both Knees Same or Never Had Symptoms; One Knee Worse; Two Knees Worse), as well as global assessment of change categories for low back (Better, Same/Never Had Symptoms, or Worse). Similarly, we used ANCOVA analyses to examine mean changes in WOMAC scores according to the number of joint sites reported as Worse (0–5). Covariates included age, gender, BMI (calculated from measured height and weight), baseline radiographic severity (as defined above), and time between baseline and follow-up assessments. We were primarily interested in whether WOMAC and radiographic severity change scores varied linearly across the global assessment of change categories (rather than whether there was an overall difference among categories) and therefore report the p-value for linear trend in each of these analyses. All analyses were conducted using SAS version 9.1 (SAS Institute, Cary, NC), and statistical significance was evaluated at the p<0.05 level.
The sample (n = 894) was relatively equally distributed across three GOGO study sites: Duke University (33%), University of North Carolina Chapel Hill (33%), University of Nottingham (34%). Participants were 80% female, the mean age was 66.0 years (SD=8.4), and the mean body mass index was 28.4 kg/m2 (SD = 5.6). Between baseline and follow-up, there were small increases in mean scores for WOMAC total (25.3, SD = 20.1 vs. 27.5, SD = 20.8), pain (5.2, SD = 4.4 vs. 5.6, SD = 4.7), and function (16.4, SD = 13.8 vs. 18.0, SD = 14.2).
Among those who had knee radiographs at baseline (n = 852), 43% had OA in at least one knee. The mean radiographic severity scores across both knees at baseline and follow-up were 3.2 (SD = 3.7) and 3.8 (SD=3.7), respectively; mean osteophyte scores were 2.0 (SD = 2.3) and 2.3 (SD = 2.1), respectively; mean JSN scores were 1.6 (SD = 2.1) and 2.0 (SD = 2.2), respectively. Among those who had hip radiographs at baseline (n=862), 37% had OA in at least one hip. The mean radiographic severity scores across both hips at baseline and follow-up were 3.2 (SD = 2.6) and 3.5 (SD=2.8), respectively; mean osteophyte scores were 2.6 (SD = 1.5) and 2.9 (SD = 1.7), respectively; mean JSN scores were 0.5 (SD = 1.3) and 0.6 (SD = 1.5), respectively. Among those who had lumbar spine radiographs at baseline (n = 502, performed at Duke and University of North Carolina sites only), 56% had OA. The mean radiographic severity scores for the lumbar spine at baseline and follow-up were 2.9 (SD = 2.7) and 3.6 (SD = 2.9), respectively; mean osteophyte scores were 1.1 (SD = 1.3) and 1.5 (SD = 1.4), respectively, and mean JSN scores were 1.8 (SD = 1.5) and 2.2 (SD = 1.6), respectively.
Participants’ responses to global assessment of symptom change items for knees, hips, and low back are summarized in Table 1. Very few participants reported that symptoms got better for any of the joint groups. About 47% of participants reported that symptoms were worse in at least one knee, about 43% reported that symptoms were worse in at least one hip, and 43% reported that symptoms were worse at the low back. About 70% reported that at least one joint (hip, knee, or low back) got worse.
Changes in WOMAC total, pain, and function subscale scores showed expected patterns according to global assessment of change categories (Table 2, Figure 1). Specifically, WOMAC scores increased (indicating worsening symptoms) among those who indicated symptoms were worse than at baseline in at least one hip or knee, as well as at the low back. Furthermore, increases in WOMAC scores were greater among those who indicated that both hips or both knees had worsened, compared with just one hip or one knee getting worse. There were very small changes in WOMAC scores among those who reported that their knees had gotten better. However, there were larger decreases in WOMAC scores among those who reported that hip or low back symptoms were better. Tests of linear trend were statistically significant for all WOMAC scores for knees and hips, as well as for WOMAC total and function scores for the low back.
Changes in knee radiographic severity scores also generally showed expected patterns according to global assessment of change categories (Table 2). While mean radiographic severity scores increased for all global assessment of change categories, there were larger changes (i.e., greater worsening of radiographic severity) among those who indicated that symptoms had gotten worse. These patterns were observed for the overall radiographic severity score, as well as for changes in JSN and osteophytes separately. However, linear trend tests were not statistically significant for any of these analyses.
Changes in WOMAC total, pain, and function subscale scores showed expected patterns according to the number of joints participants reported were Worse (Table 3, Figure 1). Participants who reported that no joints had gotten worse had small decreases in WOMAC scores (indicating slight improvement), and participants who reported that 1 or more joints were Worse had increased WOMAC scores (indicating worsening symptoms). WOMAC scores generally increased more with the number of joints being reported as Worse, except that the mean increases in WOMAC total and pain scores were smaller for those with 5 vs. 4 worse joints. Tests of linear trend were statistically significant for all WOMAC scores.
This study examined associations of patient global assessments of hip, knee, and low back symptom change (better, same/never had symptoms, or worse) with changes over time (mean of 4 years) in WOMAC scores and radiographic severity. Changes in WOMAC scores showed the expected patterns according to these global assessments. Specifically, mean WOMAC total, pain, and function scores decreased among participants who indicated joint symptoms were better at follow-up, showed little change among those who reported symptoms were the same, and increased among those who indicated symptoms were worse. For all analyses except the comparison of changes in WOMAC pain scores according to global assessment of low back symptom change, statistical tests indicated a significant linear trend. Notably, for both hips and knees, WOMAC change scores were higher (indicating a greater degree of worsening in symptoms) among participants who reported that both hips/knees had gotten worse, compared with just one hip/knee. Since the WOMAC is most often used in the context of hip or knee OA, it is noteworthy that there was also good agreement between the global assessment of low back symptom change and changes over time in WOMAC scores. A prior study also showed that in cross-sectional analyses, WOMAC scores were strongly associated with low back pain (18). Furthermore, changes in WOMAC total, pain, and function scores showed the expected patterns when compared according to the total number of joints (hip, knee, and low back) participants reported as being Worse. Linear trends were statistically significant in these analyses also. This lends further support for the ability of the global assessment measure to capture overall changes in lower extremity symptoms and function.
Changes in total radiographic severity scores, osteophytes scores, and JSN scores also generally showed the expected patterns according to global assessment of change categories, though linear trend tests were not statistically significant. There are several possible reasons for this weaker association, including the general lack of strong correlation between radiographic severity and joint symptoms (19) and the fact that factors other than bone changes visible by radiograph can affect symptom severity (20). In addition, there were fairly small changes overall in radiographic severity over time.
There are several important strengths and potential applications of this global assessment of change measure. First, this item asks individuals to recall change in a very specific aspect of health, and studies show that condition-specific measures are more sensitive to change than measures of global health status (21, 22). Second, this item is administered at a single time but assesses change in joint symptoms over a specified time interval. Therefore it may be useful for assessing symptom change when baseline data are not available. While the categories in this measure are broad, they are useful for simply discriminating between individuals whose symptoms have remained stable, worsened, or improved over a specified period of time. This type of information can be particularly useful in large, cross-sectional epidemiological studies. Third, because this is a very brief measure, it can be readily used in clinical settings to monitor patients’ responses to new treatments.
There are some general limitations to recall-based outcome measures. First, recall of prior symptom severity may be difficult for some individuals, particularly for lengthier recall periods. Second, there may be recall bias associated with these measures. Some research suggests that assessments of change and prior symptoms may be influenced by the severity of present symptoms (23–26). A third possible limitation is “response shift,” where people undergo adaptation to their chronic condition and redefine its severity or impact. These phenomena need consideration for longitudinal studies spread over several years (27). There are some other limitations to these analyses. First, due to the nature of the study, very few participants experienced improvements in symptoms. Additional research is needed to test the usefulness of these global assessment of change questions in contexts where improvement is expected, such as clinical trials of medications or other treatments. Second, the radiographic severity score used in these analyses (the sum of the individual radiographic feature scores for JSN and osteophyte) has not been commonly used or validated. However, we found that this score was strongly correlated with Kellgren-Lawrence grade, a widely used and validated scoring system.
In summary, this study showed that simple global assessments of symptom change for the knee, hip, and back were significantly associated with actual changes in WOMAC total, pain, and function scores for all joint groups and also showed expected patterns with respect to changes in radiographic severity. This provides support for the utility of these measures and extends our previous work supporting this same global assessment for hand symptom change (9). Additional research should examine the validity of these measures in other samples, including both epidemiologic studies and clinical trials.
This study was supported by a grant from GlaxoSmithKline. This study was performed in part, at the Duke General Clinical Research Unit, funded by NIH MO1-RR-30, National Center for Research Resources. The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Department of Veterans Affairs.