Demographic data and clinical characteristics of the participants are shown in table 1. For the analyses of the intra‐rater reliability, two patients were excluded because the disease was not stable between the two time points of assessment. Mean time for assessments with the ICF Core Set for RA was 34.2 (SD 9.1, range 20–75) minutes.
Missing values of more than 5%, the result of response options “not specified” and “not applicable,” were present in 9/95 categories within raters (9%) and 23/95 categories (24%) between raters.
Tables 2–5 list the ICF categories in the different components Body Functions, Body Structures, Activities and Participation, and Environmental Factors and present intra‐rater and inter‐rater reliability with percentages for complete agreement and kappa statistics.
Table 2Body functions: categories from ICF Core Set for RA with intra‐rater and inter‐rater reliability
Table 3Body structures: categories from ICF Core Set for RA with intra‐rater and inter‐rater reliability
Table 4Activities and participation: categories from ICF Core Set for RA with intra‐rater and inter‐rater reliability
Table 5Environmental factors: categories from ICF Core Set for RA with intra‐rater and inter‐rater reliability
Mean intra‐rater agreement for all ICF categories was 59% which increased to 72% after collapsing of qualifiers, ranging from 29% (e340) to 96% (b510) before, and from 44% (e450) to 96% (b510) after collapsing of qualifiers (tables 2–5).
Mean inter‐rater agreement for all ICF categories was 47% and increased to 61% after collapsing of qualifiers, ranging from 0% (e450) to 80% (d560) before, and from 8% (d415) to 88% (d560) after collapsing of qualifiers (tables 2–5).
The mean intra‐rater agreement per component was 61% for Body Functions, 62% for Body Structures, 60% for Activities and Participation, and 52% in the component Environmental Factors. The mean inter‐rater agreement was for Body Functions 55%, for Body Structures 46%, for Activities and Participation 51%, and 31% in the component Environmental Factors. Between raters 52% of the ICF categories showed at least 50% agreement (78% after collapsing), and in 77% within raters (99% after collapsing).
Weighted kappa statistics showed reliability of 0.4 or higher in 82/95 ICF categories (86%) within raters, but only in 41/95 ICF categories (43%) between raters (table 6).
Table 6Frequency of observer agreement within and between raters for categories in the ICF Core Set for RA
Rasch analyses suggested that reduction of the number of qualifiers from five to three—and from nine to three for environmental factors—improved both inter‐rater and intra‐rater agreement. According to these results, the response levels 1–2 and 3–4 of the ICF categories belonging to Body Functions, Body Structures, and Activities and Participation were collapsed, respectively. In the component Environmental Factors, the response levels from −4 to −1 and from 1 to 4 were collapsed.
Several considerations were thereby taken into account: Firstly, the number of response categories that does not follow the consecutive order intended was considered. Secondly, a further collapsing strategy was studied—namely, the collapsing of the response categories 3 and 4. However, this strategy did not yield satisfactory results as most of the ICF categories still presented response categories that did not have a consecutive order (results not shown). Also, owing to the low frequencies in response categories 3 and 4, no further collapsing strategies, such as collapsing response categories 1 and 2 and 2 and 3, were considered. Finally, the same response format was intended for all ICF categories. The proposed collapsing strategy is clinically intuitive for judging the severity of a problem in the corresponding ICF categories. After collapsing the response categories, only four ICF categories in the functioning component did not follow a consecutive order and five in the component environmental factors.