The feasibility of a PRO is not absolute, but depends on the context in which it is being used. To our knowledge, this is the first feasibility study to compare commonly used disease-specific and generic PROs head-to-head in a hip registry setting. We found that all 4 PROs are feasible for use in a hip registry setting. Our feasibility criteria were response rate, floor and ceiling effects, missing items, and need for manual validation of the scanned PROs. A high response rate is important to ensure generalizability and to minimize selection bias. A response rate of 80% is usually considered to be sufficiently representative of the sample studied. We thus chose, a priori, this cut-off for the mailed patient-reported data used in the study. Much higher response rates are, however, achieved with regard to hard data entered into joint registries. For example, the DHR has a coverage of 96% (Overgaard 2012
). These types of data collection differ with regard to the person providing the data (patient vs. health professional), ethics (patients are not legislated to provide data), and setting (in-hospital vs. home) and thus different response rates can be achieved.
Low floor and ceiling effects enable measurement of deterioration and improvement. The cut-offs were chosen based on previous findings (Terwee et al. 2007
). A high percentage of missing items will make the PROs and sum scores less valid. The need for manual validation of the scanned PROs is an important indirect indication of the patient’s general ability to correctly fill in the PRO, and also provides information about the workload of the manual validation required. The complexity of a PRO or the lack of comprehensiveness can have an influence on response rate, the proportion of items missing, and the proportion of items requiring manual validation. Finally, the discriminative ability of each PRO gives a hypothetical number of subjects needed to discriminate between subgroups, and may contribute to the decision as to which PRO to use in further registry studies when subgroup analyses are of interest.
It is unclear whether follow-up time affects the response rate (Baker et al. 2007
, Rothwell et al. 2010
). We saw no difference in response rate with follow-up times ranging from 1 to 11 years, which supports the view that follow-up time is unrelated to response rate. To achieve our response rate, we used several strategies including using short questionnaires and sending out up to 2 reminders, as it is known that these strategies contribute to a higher response rate (Edwards et al. 2009
). Due to the age of our patient population and their varying familiarity with computers and the internet, we used paper-based questionnaires sent by regular mail (Rolfson 2010
The presence of floor and ceiling effects may influence the reliability, validity, and responsiveness of outcome measures. A worst or best score reported by 15% of the group studied is considered the maximum acceptable (Terwee et al. 2007
). However, considering the good outcome of THA, low floor effects and high ceiling effects might be expected; therefore, the criterion of having the best possible score in less than 15% of patients following THA might be too restrictive. In support of this, others have reported a lower ceiling effect for the same PROs when administered preoperatively (Naal et al. 2009
). A lower ceiling effect preoperatively than postoperatively is self-evident, and has been shown previously by others (Ostendorf et al. 2004
). The lower ceiling effect in SF-12 PCS and SF-12 MCS may be due to computation of these subscales with a norm-based value set, which has also been shown by Linde (2009)
. Missing data reduce the quality of data. In a study of 3,156 RA patients, about 7% of patients were missing more than 20% of the items for SF-12 PCS, SF-12 MCS, and EQ-5D (Linde 2009
). This high amount of missing items could in part be explained by a higher percentage of females included in that study (75–80%) than in the present study (58% females), as we found that females leave more unanswered items than males. We handled missing data in accordance with the directions set out in the specific manual for each PRO.
A higher percentage of PRO items requiring manual validation may indicate a less patient-friendly PRO format, and is more costly due to the manual labor required. In our sample, the EQ-5D VAS required manual validation about 3 times as often as the other questionnaires, suggesting that the EQ-5D VAS is less useful for a mailed survey in a registry population.
Several methodological problems must be considered when interpreting our results. The EQ-5D index had a bi-modal distribution of data, as previously reported by others (Jansson and Granath 2010
), probably due to the EQ-5D algorithm. The implication is that the uncertainties of the results are greater than described by the confidence intervals and p-values, and all the consequences of this may not be known yet. This must be considered when interpreting our results. Our results have high external validity since the distribution of age groups, the sex ratio, diagnoses, and types of prosthesis were similar between our study population and the entire Danish THA population, as well as hip replacement populations seen in other hip registries. Regarding knee arthroplasty, Dunbar (2001)
compared properties of the SF-12 and the Oxford knee score in a knee registry setting and found response rates, percentages of fully completed questionnaires, and floor and ceiling effects comparable with our findings from the SF-12 and OHS, suggesting generalizability of our results. We minimized selection bias by randomly selecting patients for inclusion and we tried to achieve equal age and sex composition in the groups.
We conclude that the HOOS, the OHS, the SF-12, and the EQ-5D are all appropriate PROs for administration in a hip registry. We found minor differences between the disease-specific and the generic PROs regarding ceiling and floor effects as well as discarded items. This information may be useful for decision making about the use of particular PROs in a registry-based setting, and other settings of different study design might also benefit from our results.