|Home | About | Journals | Submit | Contact Us | Français|
The WOMAC is the most widely used self-report measure to evaluate physical functioning in hip or knee osteoarthritis, however its ability to discriminate pain and physical functioning (i.e. discriminate validity) has repeatedly been questioned. Little to no data is available on the discriminant validity of alternative questionnaires that measure the same construct, for instance the Hip and Knee Osteoarthritis Outcome Score (HOOS and KOOS, respectively) and the Lower Extremity Function Scale (LEFS). Therefore, we translated the LEFS to Dutch and studied its psychometric properties (i.e. validity, reliability and responsiveness). In addition, we assessed the discriminate validity of the LEFS, HOOS and KOOS.
After translation with a forward/backward protocol, 401 individuals with hip or knee osteoarthritis completed the LEFS, HOOS/KOOS, SF-36, Hospital Anxiety and Depression Scale and Checklist Individual Strength questionnaires. To assess reliability and responsiveness, a sample of 106 and 108 patients completed a comparable set of questionnaires within 3weeks and 3months, respectively. Feasibility, validity, reliability and responsiveness were evaluated. Discriminant validity of the LEFS, HOOS and KOOS was examined by contrasting the scales’ correlations with the physical functioning subscale of the SF-36 with the scales’ correlations with the bodily pain subscale of the SF-36.
The Dutch version of the LEFS was feasible, had good internal consistency (0.96), good reliability (ICC=0.86), good construct and discriminant validity, and showed no floor or ceiling effects. The minimal detectable change (MDC90) was ten points. Area under the receiver operating characteristic curve (AUC) analyses revealed good (AUC=0.76) and fair (AUC=0.63) responsiveness for the LEFS in improved and worsened patients, respectively. Discriminant validity for pain was apparent for the LEFS (p<0.01), but not for the HOOS and KOOS (p=0.21 and p=0.20, respectively).
Considering the LEFS’ good psychometric qualities and ability to discriminate between pain and functioning, we recommend the LEFS as the outcome measure of choice to assess self-reported physical functioning in individuals with hip or knee osteoarthritis.
Numerous self-report measures on physical function are available for the evaluation of patients with hip or knee osteoarthritis . Among those, the licensed for use Western Ontario and McMaster University Osteoarthritis Index (WOMAC)  is the most widely used . It is recommended by the Osteoarthritis Research Society for use in clinical trials in patients with hip or knee osteoarthritis to measure pain and disability . However, consensus statements consistently advocate that pain and physical function must be measured independently [3,5]. A solid body of evidence demonstrates that the WOMAC-PF (Physical Function subscale) is unable to discriminate between pain and function [6-9].
Recently, three new license free self-report measures to determine functioning in patients with osteoarthritis have become available; the Hip Osteoarthritis Outcome Score (HOOS) , the Knee Osteoarthritis Outcome Score (KOOS)  and the Lower Extremity Function Scale (LEFS) . One of those new measures, the LEFS, showed promise as a competitive alternative to the WOMAC-PF, as the LEFS can differentiate pain and functioning  and detect changes in functional status in the period immediately following surgery . Moreover, the LEFS has excellent test-retest reliability, internal consistency and construct validity [12,13,15]. To date, it remains to be seen, whether the physical function scales of the HOOS and KOOS can discriminate between pain and physical function [10,11,16].
Since the LEFS is currently not available in Dutch, the primary purpose of this study was to evaluate the psychometric qualities of the Dutch LEFS in people with hip or knee osteoarthritis. Our secondary objective was to assess the discriminant validity for pain of the physical function subscale of the HOOS and KOOS and the LEFS.
First the English version of the LEFS was translated into Dutch according to a standardized procedure described by Beaton et al., and secondly it was tested for psychometric quality by use of prospective data.
The translation procedure consisted of four steps. First, two persons translated independently of each other the English version of the LEFS into Dutch (forward translation) (T1 & T2); one translator (TJH) had a medical background and was familiar with the concepts of the questionnaire and the other (VvS) was a certified translator without a medical background. Both were native speakers. Based on a consensus meeting one final version (T-12) was formed. Second, two bilingual persons (T3 & T4) translated the T-12 questionnaire back into English (BT1 & BT2), to guarantee a consistent translation of the questionnaire. Both translators (PA & DKJ) were unfamiliar with the original questionnaire, the concepts of the questionnaire, and had no medical background. DKJ is also a certified translator. Third, an expert meeting was organised in which all translators, two health professionals (CKS, ML), a methodologist (CHMvE) and two language experts participated. During this meeting all versions of questionnaires (T1, T2, T-12, BT1, BT2) were combined and consensus on semantic, idiomatic, experiential and conceptual equivalence was reached resulting in a pre-final version of the questionnaire. The developers of the original questionnaire approved all previous steps and the final version. Finally, the pre-final version was presented in a group of 33 patients (20 women and 13 men; age (SD): 63 (13) years) to explore the clarity of the questionnaire. All patients were asked whether they understood the items and whether they could interpret the questionnaires correctly. Also, the time needed to complete the questionnaire was timed. The findings were discussed among the translators, resulting in only minor changes to the final Dutch version of the LEFS. Mean completion time was 3.5 (SD=1.5) minutes. For the final version of the Dutch LEFS see Appendix 1.
Individuals (≥18years) diagnosed with hip or knee osteoarthritis (inclusion period June till October 2009) by an orthopaedic surgeon in the Sint Maartenskliniek hospital Nijmegen were eligible. People reporting concurrent rheumatoid arthritis, fibromyalgia or psoriatic arthritis, were excluded. Written materials were sent by mail: this included an information letter, an informed consent form, the questionnaires and a return envelope. At baseline, all patients completed four questionnaires, the LEFS, the HOOS or KOOS (depending on index joint), the SF-36 and the Hospital Anxiety and Depression Scale (HADS). A reminder was sent to those patients who did not respond within three weeks, to ensure a high response rate. One-hundred and twenty participants were sent a follow-up questionnaire to evaluate test-retest reliability (within 3weeks) and another 120 participants were sent a follow-up questionnaire to evaluate responsiveness (after 3months); as 100 participants were deemed sufficient . By use of random numbers the 240 patients were selected to either the reliability or responsiveness study. Both follow-up mailings consisted of three questionnaires (LEFS, HOOS or KOOS, and the SF-36) and a global perceived effect question. For test-retest reliability, we considered a time interval of 3weeks to be appropriate for the current population. For responsiveness, we deemed a period up to 3months long enough to allow for improvement and brief enough to minimize the risk of a response shift [18,19].
The study was approved by the Institutional Review Board of the University Medical Centre Nijmegen (ID: 2009/20).
The LEFS is a 20-item condition-specific questionnaire designed to be applicable to individuals with musculoskeletal conditions of the lower extremity . Each item of the LEFS scores on a 5-point scale ranging from 0 to 4 points. When scoring the LEFS, up to 4 missing item responses are permitted, for more detailed information see Stratford et al. (2005) . Accordingly, LEFS scores range from 0 to 80 points, with higher scores representing higher levels of functioning.
The HOOS and the KOOS include five subscales: Pain, other Symptoms, Function in Daily living (ADL), Function in Sport and Recreation (Sport/Rec), and hip/knee-related quality of life (QoL). Standardized response options are given (5-point Likert scale) and each question is scored from 0 to 4 points. Subsequently, a normalized score (100 indicating no symptoms and 0 indicating extreme symptoms) is calculated for each subscale. The Dutch HOOS and KOOS have good internal consistency, construct validity, no floor and ceiling effects and have been found to be reliable [10,11]. Both the HOOS and KOOS questionnaires include the WOMAC osteoarthritis-index in its complete and original format (with permission, http://www.koos.nu).
The SF-36 is a generic health status questionnaire which contains 36 items . It measures eight major attributes (bodily pain; physical function; social function; role limitations because of physical problems; role limitations because of emotional problems; mental health; vitality; general health perceptions). It is widely used, reliable, validated into Dutch and is easy to complete. Higher scores indicate better health .
The Hospital Anxiety and Depression Scale (HADS) is a 14-item scale designed to detect anxiety and depression, independent of somatic symptoms . It consists of two 7-item subscales measuring depression and anxiety on a 4-point response scale (from 0, no symptoms, to 3, maximum symptoms), with possible scores for each subscale ranging from 0 to 21. HADS is a valid and reliable screening instrument for detecting mood disorder in people with osteoarthritis [24,25]. Higher scores indicate higher levels of disorder.
Fatigue is measured with the 8-itemed “Subjective Fatigue” subscale of the Checklist Individual Strength (CIS) . The outcomes per question are given in a 7-point scale, ranging from the statement ‘totally right’ to the statement ‘totally wrong’. The total score is counted in points with a range of 1-7 per question and a total score range of 8-56 points. The CIS is a sensitive instrument with good discriminating power and reliability .
The external criterion for distinguishing between improved and unimproved subjects was a 7-point global perceived effect (GPE) scale. The categories of improvement included the following: completely recovered, much improved, slightly improved, not changed, slightly worse, much worse, and vastly worsened.
Descriptive statistics were used to describe the study population and the number of missing values. Data symmetry was tested by use of visual inspection of the data distribution plotted by histograms. Psychometric qualities of the LEFS were expressed by floor- and ceiling effects, internal consistency, test-retest reliability, minimally detectable change, construct validity, discriminant validity and responsiveness.
Floor and ceiling effects were determined by calculating the number of individuals that obtained the lowest (0) or highest (80) scores possible and were considered present if more than 15% of the participants achieved the highest or lowest score .
Internal consistency – an indicator for the homogeneity of a questionnaire - was assessed with Cronbach’s alpha and 95% confidence intervals (95% CI’s). Internal consistency is considered good when Cronbach’s alpha lies between 0.7 and 0.9 . Dimensionality was assessed by performing principal component factor analysis with loading coefficient absolute value suppression at 0.40 on the LEFS, KOOS-PF and HOOS-PF to determine if the individual items loaded on a single factor. Factor extraction had three requirements: scree plot point of inflection at the second Eigenvalue, Eigenvalue cut-off >1.0, and ≥10% variance .
Reliability concerns the degree to which the results of measurement are consistent across repeated measurements . Test-retest reliability of the Dutch LEFS was determined by means of Intraclass Correlation Coefficients (ICCs) (two-way random effects model absolute agreement) and Bland and Altman plots . The ICC(2,1) equals variance between patients divided by variance between patients plus variance between measurements plus error variance. The value of the ICC ranges from 0 to 1, where one represents perfect reliability of the measurement. Consequently, to quantify the reliability of the LEFS scores we determined the standard error of measurement (SEM=SD[√1-ICC]). The SEM is a representation of measurement error expressed in the same units as the original measurement. We quantified the minimal detectable change at the 90% and 95% confidence level (MDC90 and MDC95) by multiplying the point estimate of the SEM, the square root of 2 (to account for the error associated with repeated measurements), and the z score of 1.65 or 1.96 (resp. 90% or 95% confidence level); formula MDC90=SEM * 1.65 * √2 and MDC95=SEM * 1.96 * √2 .
Construct validity reflects the extent to which a particular measure consistently relates to other measures with theoretically derived hypotheses for the constructs that are being measured . To evaluate the construct validity of the LEFS, we formulated a set of 16 hypotheses (eight for knee osteoarthritis and eight for hip osteoarthritis) about the expected magnitude and direction of relationships between the LEFS and other instruments. If 75% or more of the arbitrarily set number of 16 hypotheses were confirmed we defined the construct validity of the LEFS as good [32,33].
Discriminant validity was examined for the LEFS and the physical function subscale of the HOOS and KOOS, by contrasting its correlation with the PF subscale of the SF-36 with its correlation with the bodily pain subscale of the SF-36. Meng et al’s test for dependent data was used to evaluate the differences between those correlations .
We studied the responsiveness of the LEFS and the WOMAC-PF extracted from the HOOS-PF and KOOS-PF) in a combined hip and knee group, as only a very small number of patients reported clinically important change, thus not allowing to study the responsiveness of the HOOS and KOOS separately. As yet, a variety of responsiveness statistics is available. However, it is not yet known which of these statistics is better for assessing responsiveness  we utilized three different analyses. First we determined the Responsiveness Ratio of Guyatt (GRI: average change of recovered patients (GPE=1-2)/SD of average change of stable patients (GPE=3-5)). If the responsiveness ratio is larger than 1, the mean change score in clinically improved patients exceeds the measurement error and the instrument may be considered to be responsive, to an extent that is proportional to the magnitude of the responsiveness ratio [36,37]. Second, we determined the Standardized Response Mean (SRM: average score change/SD of score change). By use of the modified Jackknife testing, we assessed differences in SRM statistically . Third, we calculated Receiver operating characteristic curves (ROC) for the improved subjects and for the worsened subjects using the change scores of the questionnaires and the patients’ ratings of change . The patients’ rating of change was dichotomized to identify those subjects who experienced a clinically meaningful reduction of symptoms. Important change was defined as ‘Much Improvement (GPE=1-2)’ or ‘Much Decline (GPE=6-7)’. Consequently, we computed the area under the curve (AUC). An AUC of 1.0 indicates perfect discrimination, whereas an AUC of 0.50 indicates no performance better than chance.
Four-hundred and one individuals returned the baseline questionnaire in the study (response rate 82%). After the baseline questionnaire, 121 participants received a follow-up mailing to evaluate test-retest reliability (106 responded (88%)) and 125 participants received a follow-up mailing to evaluate the responsiveness (112 responded (90%)). Patient characteristics at baseline and follow-up are presented in Table1.
The majority of patients (86%) had less than three missing values. The proportion of missing values in the LEFS questionnaire (4%) was slightly less than the proportion of missing values in the KOOS (5%) and the HOOS (8%) questionnaires. The item ‘getting in or out of bath’ had the highest number of missing values in each of the questionnaires; 5% in the HOOS, 7% in the LEFS and 10% in the KOOS.
None of the 401 participants reported the lowest possible score whereas one patient (0.26%) reported the highest functional level implying that the Dutch LEFS has no floor or ceiling effects. In addition, the distribution of the LEFS was symmetrical.
The internal consistency for the total group of patients (n=401) reached a Cronbach’s alpha of 0.96 (lower limit (LL) 95%-CI: 0.95) for the 20 items. For the hip and knee osteoarthritis group Cronbach’s alpha reached 0.97 (LL 95%-CI: 0.96) and 0.95 (LL 95%-CI: 0.94), respectively. Within-scale principal component factor analysis revealed that all items included in the LEFS, KOOS-PF and HOOS-PF loaded on a single major factor (Table2).
Within three weeks after the baseline questionnaire, five individuals improved (5%) (GPE=1-2), three worsened (3%) (GPE=6-7) and the majority (92%) remained stable (GPE=3-5). Two-way random effects ANOVA demonstrated that the ICC of the Dutch LEFS questionnaire for the total group (n=106) was 0.86. For the knee group (n=81) and the hip group (n=25) the ICC was 0.87 and 0.78, respectively. The standard error of measurement was 4.4 points. The MDC90 and MDC95 of the LEFS questionnaire was 10 points and 12 points, respectively.
The Bland-Altman plot (Figure1) shows that the mean difference between the two applications of the LEFS was 1.87 points (95%-CI 0.22 to 3.52). The limits of agreement (mean±1.96 SD) ranged from -11.56 to 15.30 points.
Thirteen of the 16 predefined hypotheses to determine the construct validity were confirmed (81%) (Tables3 and and4).4). The following three hypotheses could not be confirmed. In the hip group we found a correlation of 0.55 between LEFS and CIS scores, which was higher than the predefined cut-off of 0.5. In the knee group we found that the duration of complaints did not influence the LEFS scores and that education level (primary, secondary or higher education) did influence the LEFS scores.
Meng et al’s test demonstrated that the association of the LEFS with the SF-36 subscale pain differed significantly with the SF-36 subscale physical functioning (Table4), indicating that the LEFS has discriminant validity for pain (p<0.01). We found no significant differences between the association with SF-36 subscale’s pain and physical functioning and the HOOS-PF (r (95%-CI)=0.64 (0.51 - 0.74) and 0.71 (0.60 - 0.79), p=0.21) and the KOOS-PF (0.69 (0.62 - 0.75) and 0.73 (0.67 - 0.79), p=0.20, respectively), indicating that both questionnaires do not discriminate between pain and physical functioning.
Seven people (7%) reported relevant improvements in function (GPE=1-2), nine people reported relevant worsening (8%) (GPE=6-7) and the majority remained stable (85%) (GPE=3-5). Responsiveness Ratio of the LEFS was 1.49, close to the outcomes of WOMAC-PF (1.20) and SF36-PF (1.22) (Table5). Modified Jackknife testing demonstrated no statistical differences between the SRM for the LEFS (0.13) compared with the SRM of the WOMAC (SRM=0.02, p=0.45) and SF-36 (SRM=0.00, p=0.36). ROC curve analysis revealed that for improved patients the AUC was 0.76 (95% CI: 0.49 - 1.00) for the LEFS, 0.71 (95% CI: 0.45 - 0.98) for the WOMAC-PF (extracted from the HOOS-PF and KOOS-PF) and 0.68 (0.44 - 0.93) for the SF36-PF. For worsened patients the AUC was 0.63 (95% CI: 0.42 - 0.83) for the LEFS, 0.56 (0.34 - 0.78) for the WOMAC-PF and 0.56 (0.35 - 0.78) for the SF36-PF.
The primary objective of this study was to create a reliable and valid Dutch version of LEFS by translation and adaptation. No difficulties were encountered in the translation phase of the study; the structure of the original LEFS was not altered and all items were maintained. Moreover, participants reported no problems in the administration of the questionnaire. Considering the results of this validation study, we deemed the Dutch version of the LEFS to be an internally consistent, uni-dimesional, highly reliable and valid questionnaire to determine lower extremity functioning in patients with hip or knee osteoarthritis. Finally, the LEFS revealed good responsiveness by detecting improvement in patient GPE; however this finding should be interpreted with caution, given the small proportion of patient to actually report clinically relevant functional improvement. For our secondary objective, we were unable to demonstrate that the HOOS-PF and KOOS-PF subscales are able to discriminate between pain and physical function.
Construct validity of the Dutch version of the LEFS was good as most of the pre-formulated hypotheses were met. Three of the 16 hypotheses could however not be confirmed. First, in the hip group, the correlation between the lower extremity functioning (LEFS) and fatigue (CIS) was over 0.5 in the hip group, however similar correlations were found for HOOS-PF (r=0.55) and SF-36 PF (r=0.50). As comparative measures also demonstrate such a relation, fatigue might have a stronger relation with functioning than previously thought [10,15]. An important difference with previous studies is that we investigated fatigue with a fatigue-specific questionnaire in contrast to others that used the vitality scale of the SF-36 [10,15]. Second, participants with knee symptoms for less than five years did not report significantly less symptoms than patients with symptoms for over 5years. Again this finding was also found for the KOOS-PF (p=0.90) and the SF-36 PF (p=0.75). These findings, could however, be biased by a phenomenon called response shift, which could have resulted in an underreporting of functional disabilities in the group with the longest duration of complaints . Third and final, in the knee group we found that participants’ education level (primary, secondary or higher education) did influence the LEFS scores. It would be undesirable if LEFS scores were influenced by education level, as this would indicate that the LEFS is difficult to interpret. Further scrutiny of this finding indicates that patients with knee symptoms who enjoyed a higher education reported less symptoms than patients without or only primary education (p=0.02); also when adjusted for age, sex, BMI, co-morbidities, duration of complaints and being employed. Yet again, this finding was also found for the KOOS-PF (p=0.04), but not for the SF-36 PF subscale (p=0.08). Our findings are in contrast to a previous study that addressed the relation between the LEFS scores (Italian version) and education levels. This discrepancy can possibly explained by the different format of the Italian version; an interview-format instead of a self-reported questionnaire . It would be of interest to further elucidate this relation in other studies.
Although the responsiveness of the Dutch LEFS was good and superior to the WOMAC-PF and SF36-PF, compared to Italian validation study by Cacchio et al. (2010) (AUC=0.86) it was somewhat low . On the other hand, the psychometric properties of the Dutch LEFS (i.e. Cronbach’s alpha [12,15], reproducibility [12,13,15] and validity [12,13]) were comparable to the findings of previous validation studies. Our results regarding the responsiveness of the LEFS, WOMAC-PF and SF36-PF, should be interpreted with caution. Given the small number of patients reporting clinically relevant change which may have impacted for example the magnitude of the SRM, the point estimates might be spurious. Future (intervention) studies should further investigate the responsiveness of the Dutch LEFS.
The lack of discriminant validity for the WOMAC-PF has been demonstrated in numerous occasions [6-9,41,42]. Therefore, the greater discriminant validity of the LEFS compared to the WOMAC-PF [13,14] was one of the foremost reasons to translate and adapt the LEFS to the Dutch language. In our study we compared the LEFS questionnaire to the HOOS-PF and KOOS-PF subscales. As the physical function subscale of the HOOS and KOOS are very similar to the WOMAC-PF, these subscales are also at great risk for lacking discriminant validity. Our results indicate that the LEFS, but not the KOOS-PF and the HOOS-PF, could discriminate from pain measures, that is, KOOS-PF and HOOS-PF did not show a statistically higher correlation with the PF subscale than with the bodily pain subscale of the SF-36, whereas the LEFS did. As far as we know, we are the first to also demonstrate the lack of discriminant validity in the (Dutch version of the) HOOS and KOOS subscales, as in those particular validation studies only SF-36 subscales other than the bodily pain subscale were examined [10,11,16].
A limitation of our study is that we recruited only individuals with hip and knee osteoarthritis. Originally the LEFS has been developed as a measure that could be used for all kinds of conditions of the lower extremity . The exclusion of other condition hampers the generalizability of our findings to other complaints of the lower extremity. We did however evaluate the LEFS ability to differentiate between patients with and without additional lower extremity pain co-morbidities, which demonstrated a linear association between the number of lower extremity joint pain co-morbidities and LEFS scores. The latter analysis showed promise that the Dutch version of the LEFS is also able to detect functional disabilities in patients with other symptoms than just hip and knee osteoarthritis. Another limitation of this study is that we did not assess the association between the Dutch version of the LEFS and a set of performance measures to determine the convergent validity. Future studies should investigate this association. A third limitation, the Cronbach’s Alpha value surpassed the cut-off value of 0.90 indicating item redundancy. However, due to the magnitude of our study sample and relatively high number of items this figure might have been inflated . Finally, we studied the construct validity of the LEFS by testing hypotheses according to prespecified cut-off values; however cut-off value are often too rigid by their dichotomous (true/false) nature. Future studies should consider using the lower or upper bound of the 95% confidence interval of an association.
We found that the Dutch version of the LEFS has no floor and ceiling effects, good internal consistency, reliability, construct validity and responsiveness. Moreover, the Dutch LEFS demonstrated discriminant validity for pain, as it was able to discriminate between pain and physical functioning, whereas both the HOOS-PF and KOOS-PF did not. Therefore, we recommend the use of the Dutch LEFS as an outcome measure for physical functioning in patients with hip and/or knee osteoarthritis.
Beste meneer/mevrouw,Heeft u of zou u vandaag enige moeite hebben met de volgende bezigheden?Vult u alstublieft alle items in, ook wanneer u de bezigheden niet meer doet.Score: _____/80 punten.
The authors declare that they have no competing interests.
Authors TJH, RAdB, AAdB, CHMvdE; 1) have all contributed to conception and design of this study; 2) have been involved in drafting the manuscript and revising it critically for important intellectual content; and 3) have given final approval of this version to be published.
The pre-publication history for this paper can be accessed here:
All authors like to acknowledge Patsy Anderson, Debby Kenyon-Jackson, and Vera van Schagen for their contributions in translating and re-translating the Dutch version of the LEFS. We also like to thank Clarinda Kersten-Smit and Monique Limborgh for their contributions during the expert meeting and prof. dr. Paul Stratford for allowing us to translate the questionnaire. And of course we want to thank all who participated in this study.
The grant supporter of this study was the Department of Rheumatology of the Sint Maartenskliniek hospital, Nijmegen, The Netherlands.