The results of this validation study of the Dutch KOOS questionnaire showed a good internal consistency for all study groups. Reliability was also good in the mild and moderate OA group and the revision TKA group. It was not assessed the patients with severe OA and patients with a TKA. The construct validity was moderate for the patient groups with mild and moderate OA and for TKA patients, and lower for the severe OA and revision TKA patients. Ceiling effects were present in the mild OA group and in the severe OA group. Floor effects were seen in the patient group with severe OA group and the revision TKA group.
In this validation study Cronbach's alphas were above 0.70 for almost all subscales in our patient groups. This indicates a good internal consistency, which is in line with the study of Roos et al. [6
]. However, for the subscale Symptoms in the severe OA population we found a Cronbach's of 0.56, indicating a moderate internal consistency. Deleting one or more questions did not result in a higher internal consistency. Kessler et al. and Xie et al. also found a lower Cronbach's alpha (< 0.70) for this subscale in patients with OA of the knee [23
In our study, factor analysis was performed on the whole study population and we found that all items of the Dutch version of the KOOS questionnaire loaded on one factor. Our results are in contrast with the conclusion of Roos et al. that the KOOS items loaded on five factors [6
]. However, our findings are in line with Thumboo et al. and Faucher et al. who claimed that the subscales Pain and Physical function of the WOMAC loaded on the same factor [25
]. In the present study, the factor loading of the question S4 (can you straighten your leg fully) was lower than 0.40 which suggests that this item might be excluded from the questionnaire. Despite our preliminary results indicating that the Dutch version of the KOOS questionnaire contains one single factor, we retained in our analyses the original subscales of the Swedish version of the KOOS questionnaire. However, based on our findings we recommend additional factor analyses on other data sets, before changing the number of subscales of the Dutch version of the KOOS questionnaire.
In the present study the test-retest reliability was good for the patient groups with mild OA (ICC 0.74–0.88), moderate OA (ICC 0.87–0.94) and patients after a revision TKA (ICC 0.73–0.89). A lower ICC (0.45) for patients after a revision TKA for the subscale Sport/recreation was found. When deleting all outliers the ICC is still smaller then 0.70 (ICC 0.62). It is plausible that for these older patients questions about sport and recreation are less relevant.
The construct validity of the KOOS questionnaire was determined by comparing the KOOS subscales with the subscales of the SF-36 and the VAS for pain. Correlations between subscales, which measure the same construct, were compared. In our study we found the highest correlations between the KOOS subscales and the SF-36 subscales which are intended to measure the same constructs. Within the TKA patient group we found some higher correlation coefficients compared to the study of Roos et al. (ADL vs PF r = 0.83 vs 0.48 and Pain vs PF r = 0.66 vs 0.19) [8
]. The correlations we found within the severe OA patient group (ranging from r = 0.12 to 0.57) were lower than found by Xie et al. They found correlations between r = 0.37 and 0.65 for the English version and r = 0.24 and 0.64 for the Chinese version of the KOOS [24
]. Kessler et al. compared the subscales of the KOOS with the SF-12 for the same population and found a low correlation between the subscale Symptoms and the SF-12 (r = 0.05); the other subscales showed correlations of 0.60 or higher [23
By only reporting the correlations coefficients it is not clear whether the construct validity of a questionnaire is sufficient or not. Therefore Terwee et al. developed quality criteria for design, methods and outcomes of studies to compare the measurement properties of health status questionnaires [22
]. These authors recommended assessing the construct validity by testing predefined hypotheses (e.g., about expected correlations between measures or expected differences in scores between 'known' groups). Without specific hypotheses there is a risk of bias, because retrospectively it is tempting to generate alternative explanations for low correlations instead of concluding that the questionnaire may not be valid. Terwee et al. give a positive rating for construct validity if hypotheses are specified in advance and at least 75% of the results are in correspondence with these hypotheses [22
]. Our choice that convergent correlations should have a correlation coefficient of ≥ 0.60 and divergent correlations of ≤ 0.30 is arbitrary. However, there is no consensus in literature how to deal with this issue. From our pre-defined hypotheses 60% or more could be confirmed in both the mild and moderate OA group and in patients after a TKA (moderate construct validity). Less than 45% from our hypotheses could be confirmed for patients with severe OA and after a revision TKA (lower construct validity).
The formulation of the hypotheses was based on the starting point that there is a clear distinction between the subscales of the KOOS questionnaire. However, with factor analysis we found that all items of the Dutch version of the KOOS questionnaire seem to load on one factor. This may explain the overlap between the correlations of the different constructs of the KOOS questionnaire with the SF-36. This is most obvious for the subscales Pain and ADL of the KOOS in relation to the subscales BP and PF of the SF-36. Previous studies showed that the WOMAC subscale pain and physical function loaded on the same factor [25
]. Apparently it is difficult for patients to make a distinction between questions about pain and physical functioning in ADL. In our opinion this can be ascribed to the formulation of the questions; the term difficulty (translated in Dutch: 'moeite') may be not clear for some patients. The meaning of this term should be clarified or re-formulated. This was also suggested by Stratford et al, and Terwee et al. [29
Because it is known that clinimetric properties are variable in different study populations [14
], it is recommended to validate a questionnaire in the target population. This study showed that the clinimetric properties of the Dutch version of the KOOS questionnaire differed between the 5 different patient groups, which confirms the above described recommendation. Additionally, in future validation studies of the KOOS questionnaire, it may be of interest to evaluate the validity of the Dutch KOOS questionnaire by comparing the subscales of the KOOS questionnaire with the Dutch Oxford 12-item knee questionnaire. This latter questionnaire was considered to be valid and reliable in patients with OA of the knee [31
]; however, it was not validated when we started the present study.
We observed ceiling effects only in the mild OA patient group for the subscales Pain, Symptoms and ADL of the KOOS questionnaire. It is plausible that these patients have few complaints of their knee and have no or minor clinical signs of OA, which can explain the presence of ceiling effects in this group of patients. Floor effects were only found in the subscale Sport/recreation in the patients with severe OA and in patients after revision of the TKA. Roos et al. stated that questions about sport and recreation also are relevant for older patients [8
]. However, this does not seem to apply for patients after revision of the TKA. Because of severity of the disease and/or higher age, it is plausible that these patients do not participate in sport and recreational activities. Dividing the revision population into those younger than 65 years and older than 65 years resulted in floor effects of over 50% in the older patients. Questions about sport may be more relevant to younger patients than to older patients. Because the KOOS questionnaire was originally developed for younger patients this finding is not surprising.
Our study is not without limitations. First, because the selection of patients in the present study only allows statements on the reliability and validity of the KOOS questionnaire in patients with different stages of OA and it's treatment. The questionnaire was not studied in patients after a menisectomie or an ACL reconstruction. The results of the present study could not be generalized to patients with an acute knee trauma.
Second, a measurement tool can also be used to monitor the efficacy of an intervention or the disease process of the patient. For this goal the tool needs to be sensitive to detect clinically relevant changes during a certain period of time (responsiveness). ICCs are strongly influenced by the heterogeneity of the study population.
The interpretation of the SEM, i.e. whether it should be regarded as a large or a small measurement error, depends on what changes are minimal important on the KOOS subscales. The smallest detectable change (defined as 1.96*√2*SEM) has to be smaller than the minimal important changes [20
]. Future studies should look at what changes in scores on the KOOS subscales constitutes minimal important change. In addition, the responsiveness of the KOOS questionnaire needs to be evaluated in a future study.