Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
J Am Acad Orthop Surg. Author manuscript; available in PMC 2011 February 1.
Published in final edited form as:
PMCID: PMC2951475

Challenges With Health-related Quality of Life Assessment in Arthroplasty Patients: Problems and Solutions


Assessment of health-related quality of life (HRQOL) using patient-reported outcomes in arthroplasty has become popular because it provides a unique perspective on successful elective procedures. However, challenges exist in the assessment of HRQOL in clinical practice and in clinical research. Patient compliance with multiple and sometimes lengthy HRQOL assessments administered at multiple follow-up visits is problematic. Many well-validated HRQOL instruments are available, and progress has been made in defining the minimal clinically important difference in hip and knee arthroplasty that denotes the minimal change perceived to be important by patients. Challenges in understanding the literature are attributable to the use of various HRQOL scales, with different scoring ranges and scoring algorithms, different interpretations of highest score, and differences in the presentation of raw versus transformed scores.

Hip and knee arthroplasties are associated with significant improvement in health-related quality of life (HRQOL).1,2 The primary reasons for using an HRQOL measure in assessing outcomes in arthroplasty are that the outcome of arthroplasty is not specific to the joint or limb but to overall impact on health;1,2 in addition, the value of measuring HRQOL is to assess the value of a procedure compared with that of any medical treatment. Cost-effectiveness cannot be compared without these HRQOL data.3

In recent years, the incorporation of HRQOL assessment into arthroplasty research and clinical practice has seen dramatic growth. The challenge is to continue widespread HRQOL assessment in an accurate, efficient, and economically reasonable fashion. This has created unique challenges, such as balancing the utility of HRQOL data with the additional burdens that such data collection represents. In a busy clinical practice, it is increasingly difficult to add yet another outcome assessment.

Assessment of pain is of prime interest in many arthroplasty studies. HRQOL instruments used in arthroplasty studies have varying psychomet-ric properties and varying levels of validation. Minimal clinically important differences (MCIDs) have been recently defined for some instruments. These three characteristics may aid in choosing an HRQOL assessment and in calculating power for a prospective study.

Balancing HRQOL Assessment Needs With Burden

Lessons learned through empirical experience have indicated that a successful HRQOL assessment process must make allowances for practical considerations such as patient burden.3 The critical challenge, therefore, is to balance the researcher’s needs for assessment data with the patient’s willingness to provide information. There is a natural tendency among study investigators to design HRQOL assessments comprehensively, including assessments for information that would be “nice to know” rather than designing a data set that economically and efficiently addresses the specific study hypotheses.

There has been discussion in recent literature as to the relative efficiencies of brief assessments.4 Much research has indicated that often “less is more.”5 Although single-item assessments of relevant HRQOL domains cannot capture the detailed information that a longer assessment can, in measuring the same domain, the brief assessments have demonstrated greater variability and sensitivity to change.6 For multi-item assessments to achieve the same degree of sensitivity to change as single-item assessments do, all of the items in the multi-item assessment must, on average, move the same amount in concert, or the difference is lost. One argument in favor of multi-item generic questionnaires is that they are more likely to capture the net description of a patient’s overall health, such as the global impact of arthroplasty on a patient’s health.

Figure 1 presents an example of this phenomenon,7 wherein assessing the HRQOL of lung cancer patients was accomplished by way of a single-item linear analog assessment (ie, Uniscale) compared with a longer multi-item assessment (ie, Lung Cancer Symptom Scale). The results indicated that the single-item assessment “Please rate your overall quality of life over the past two weeks” had greater sensitivity to change over time than did a multi-item assessment measuring the HRQOL of the same patient.

Figure 1
Graphic comparison of a single-item linear analog scale for a health-related quality of life assessment with a multi-item assessment scale for patients with lung cancer. The graph indicates that the single-item assessment had greater sensitivity to change ...

Salant and Dillman8 present a comprehensive approach to assessment design and implementation known as the Total Design Method, which has demonstrated success in a wide variety of applications. Basic ideas include presenting the assessment instrument as a professionally appearing booklet that engages the patient, with a contact number and a photograph of the principal investigator, rather than handing out a clipboard of poorly photocopied sheets of paper. Included in the booklet is a communiqué from the investigator explaining why the data the patient is being asked to provide are important and what advances in patient care might result; this presentation makes the assessment instrument look like a conversation rather than a test.

Short questionnaires are known to improve patient compliance, response rate, and the quality of response.9 Research has indicated that patients will complete assessments that include fewer than 12 questions without much consideration of the burden. Once two dozen questions have been answered, the patient has typically given all that he or she will give without perceiving a burden. Beyond 25 items, the degree to which the patient continues to comply is strongly related to the relevance, ease, logic, and degree of controversy of the items included. Once 50 items have been asked, one can expect up to 5% attrition and missed questions unless care is taken with the design. Beyond 80 questions, fatigue starts setting in even for healthy persons. A single question that interrupts the patient’s flow of assessment compliance could raise the likelihood that he or she will decide the overall assessment is not worth the time.

Psychometric theory suggests that in some situations one must ask the same question repeatedly or ask questions that are opposites to elicit an accurate response. This sort of approach incurs a risk of raising confusion and ire in the patient, who may not understand why he or she is being asked something repeatedly and why his or her intelligence is being insulted. A good test for determining whether an assessment is compliant with the aforementioned recommendations is to have the investigators complete the questionnaire in a simulated situation. They should ask themselves three questions as they complete the assessment: (1) Are there any questions that are unnecessary to the ultimate analysis? Each data point should have a plan in advance for analysis; otherwise, one is collecting data unnecessarily. (2) Are any questions unclear, confusing, or inflammatory? (3) Are any issues missed?

In terms of patient burden, the logistics of completing the assessment should be considered. This could involve assessments that can be read to the patient and/or completed over the telephone without loss of validity. Proxy respondents can provide usable data if the patient no longer can, but the responses must be assessed with special analytical procedures.10

Another issue of burden relates to the frequency with which HRQOL assessment data are collected. Although it would be optimal from an informational standpoint to gather such data daily, doing so is typically not feasible. Deciding on the assessment schedule is primarily a function of the research hypothesis/question at hand. Is the entity of interest how patients changed over time? Are we interested in only their best score over time? Is the critical information contained in the change from baseline to a given point? Can we define a “response” for each person over time and thereby produce an outcome measure that is comparable with tumor response? We examined these questions in a previous publication.3

Two short questionnaires derived from long versions that are relevant to arthroplasty literature are the reduced version of the Western On-tario and McMaster Universities Osteoarthritis Index (WOMAC) and the Medical Outcomes Study 12-Item Short Form (SF-12), a shorter version of the SF-36. The reduced version of the WOMAC was developed and validated in a cohort of total knee arthroplasty (TKA) and total hip arthroplasty (THA) patients. The WOMAC reduced scale,11 which contains 7 function items reduced from 17 items in the full WOMAC,12 was further externally validated in a cohort of 100 patients with mild/moderate knee osteoarthritis (OA).13 Validation in additional cohorts and generation of population norms is desirable. By reducing patient burden, this offers a practical, shorter alternative to the full-length WOMAC. Tubach et al14 developed a different shortened version of the WOMAC with eight function items; this assessment tool was validated in patients with OA. Validation in the arthroplasty population is lacking. The SF-12 is a generic measure of quality of life (QOL) that has been validated in the general population.1517 It correlates well with the WOMAC and the SF-361821 and has the advantage of having population norms.

We recommend that clinicians and researchers ask the aforementioned three questions before adding additional assessments in clinical follow-up or research study. Short, psychometrically valid questionnaires that address the main question should be the goal.

Assessing Pain

Because pain is the most common reason for performing TKA and THA, it is not surprising that pain is the most important outcome of interest.22,23 Most HRQOL measures have pain subscales that capture important pain domains, including pain severity on a visual analog or Likert scale (ie, pain subscales of the WOMAC function scale and the Knee Society score [KSS]), and whether pain is present during certain activities (WOMAC, KSS). It is important to define the source and location of pain, especially in the distinction between hip, knee, and back—information that is not captured in most HRQOL assessments. For the purposes of TKA and THA, left-right and hip-knee scores are the minimum necessity for attribution.

In some arthroplasty studies that specifically focus on the quality, location, and impact of pain as an outcome, it may be appropriate to use a pain inventory such as the Brief Pain Inventory or the Short Form McGill Pain Questionnaire (SF-MPQ) in addition to an HRQOL measure.2426 The advantage of using the SF-MPQ is that sensory (11 items) and affective (4 items) dimensions of pain can be assessed separately; a total score then can be obtained by adding the two results. This may be particularly relevant in view of the recent findings that clinical and/or subclinical depression and anxiety are strong predictors of persistent pain and of suboptimal outcomes in arthroplasty patients.27,28 Use of the SF-MPQ may aid in separating sensory and affective aspects of pain.

When pain is the primary outcome in a study, we recommend using a validated pain questionnaire, such as the Brief Pain Inventory or the SF-MPQ, rather than a composite HRQOL questionnaire. In cases in which HRQOL is the primary outcome and pain a secondary outcome, we recommend using an HRQOL questionnaire that has a validated pain subscale.

Validation Data for Commonly Used Assessments

A summary of commonly used, disease-specific, and generic instruments used for HRQOL assessment of TKA and THA and their strengths and psychometric properties is provided in Table 1. A recent review found that the KSS and Harris hip score (HHS), followed by the WOMAC and SF-36, were the most common outcome instruments used in clinical trials of knee replacement.62 The authors also noted substantial variation in the types of outcome assessments used.

Table 1
Commonly Used Health-related Quality of Life Instruments Used to Assess Arthroplasty Patients

The KSS and HHS were both developed by expert physician consensus and are condition-specific. The HHS provided the only outcome measure for patients with hip arthroplasty in the 1970s through the 1990s, before the advent and common use of the WOMAC and the SF-36. Because of their clinical relevance and the fact that they are based on a sound clinical framework, the KSS and HHS are still commonly used. An update of the KSS instrument is in progress.

In response to a need for more validated instruments in arthroplasty, a brief, seven-item, arthroplasty-specific HRQOL assessment unified scale, the American Academy of Orthopaedic Surgeons Lower Limb Core Instrument, was developed.56 It specifically addressed the overlap between the SF-36 and WOMAC, complements the SF-36, and has excellent psychometric properties (Table 1).

The SF-36 is ubiquitous in health care research and is perhaps the most widely used generic HRQOL assessment, primarily because of the availability of population-based normative and comparative data. Credible alternatives to the SF-36 are available.24

Many studies of patients with OA and other medical conditions have used specific subscales in addition to summaries or total scores. Examples include use of the role physical, physical function, and pain subscales of the SF-36 and of the function sub-scale of the WOMAC in studies of patients with OA or arthroplasty.63,64 The use of certain and not all sub-scales, if determined a priori, may be appropriate, although this may lead to multiple comparisons and require statistical adjustment such as the Bonferroni correction. Additionally, the amount of change on each sub-scale that is considered clinically important may be different. In some cases, it is meaningful to describe individual components of an instrument. For example, in addition to presenting HHS scores during follow-up of 108 patients after primary THA with a cementless porous-coated anatomic hip, Kim and Kim65 presented data regarding dependence in walking, limp, and range of motion, without performing additional unnecessary statistical comparisons for these scale components. Thus, additional important information should be presented, when appropriate.

It is essential for assessment of HRQOL that both a generic and a condition- or limb-specific instrument be used because specific instruments are more sensitive to change, and generic instruments capture qualifying information about a patient’s general health and allow comparison with patients with other diseases.49,66 Based on psychometric properties and ease of completion, we recommend using the SF-36 (or the SF-12 for brevity) as the generic and the Knee Injury and Osteoarthritis Outcome Score, the WOMAC, or American Academy of Orthopaedic Surgeons Lower Limb Core Instrument as the specific HRQOL instrument. The Oxford Hip44 or Oxford Knee35 scales are alternatives to consider. When comparison with previous studies is most important, the KSS or HHS may be preferred because the two are the most commonly used instruments in arthroplasty studies. The linkage between research hypothesis and obtainable data should be the primary consideration in the choice of outcome measure.

Determining a Clinically Significant Effect

The concept of MCIDs is a key issue in assessing patient-related outcomes (PROs) because of its relevance to clinical practice.67 With an elective procedure such as arthroplasty, this concept is particularly important. Clinical trials and cohort studies of HRQOL provide a change in population means (ie, pre- and postintervention scores) or compare the changes between different interventions (ie, answer the question, “How small a change in the outcome could this study detect?”). These population-level differences are commonly extrapolated to an individual level. However, patients are not interested in knowing population-level differences. Rather, they wish to know the likelihood that they will experience a meaningful improvement for the risk they take with an intervention (ie, “Is this change meaningful to me?”). This concept illustrates the difference between clinical and statistical significance (ie, an intervention may be both clinically and statistically significant or have only clinical or only statistical significance). MCID estimates are calculated using probabilistic arguments drawing on results from statistical theory or using so-called anchor-based measures—usually patient or physician global scales. Anchor-based methods pose a global question to the patient (or physician), asking about the overall improvement in pain (or function) experienced between two visits.

The next step is calculation of the amount of change on a pain scale (eg, visual analog scale, WOMAC pain scale) that corresponds to the minimal change on the global scale (usually the “somewhat better” response). Thus, the amount of change on a global scale serves as an anchor for calculation of MCID estimates. The “anchor” in defining the MCID is either an objective outcome or a patient response to a global question that is related to the outcome of interest. For example, Quintana et al68 used a five-point patient-based anchor in defining MCID in patients with TKA by asking the patients about the improvement in their knee 6 months after the intervention, with the possible responses “a great deal better,” “somewhat better,” “equal,” “somewhat worse,” and “a great deal worse.” Changes corresponding to “somewhat better” were used to establish the MCID for improvement.

MCID estimates have been described for the WOMAC and SF-36. For patients with primary THA, the MCID was 26 points for WOMAC stiffness and 29 points for WOMAC pain.55 For the same cohort, MCID estimates for the eight SF-36 sub-scales ranged from 11 points for the SF-36 physical role subscale to 20 points on the SF-36 physical function subscale. For primary TKA, the MCID estimates were 15 points for WOMAC stiffness and 23 points for WOMAC pain.54 For the same cohort, MCID estimates for the eight SF-36 subscales ranged from 12 points for the SF-36 physical function subscale to 17 points for the SF-36 bodily pain subscale.

Another advantage of calculating MCID estimates is that they can be used as a clinical trial outcome. TKA and THA are associated with large gains in HRQOL. However, when HRQOL outcomes are used to compare surgical approaches (eg, minimally invasive versus regular), specific surgeries (eg, patellar resurfacing versus no resurfacing, cruciate-retaining versus cruciate-sacrificing), or medical interventions in arthroplasty patients, the differences may have smaller effect sizes. In such instances, an MCID estimate is, therefore, a key characteristic for design of adequately powered studies.

For example, two studies that compared patellar resurfacing with no resurfacing reported no difference in KSS between the two groups, but neither study defined the MCID or was adequately powered to find some clinically meaningful difference.69,70 If these studies had collected some patient-reported global measure of improvement, then these data may have been used both to calculate MCIDs and for power calculations. Nonetheless, with 44 patients per treatment group, the study by Diduch et al70 had 80% power to detect a difference of 61% of a standard deviation (SD) (a moderately large and almost certainly clinically meaningful effect size); hence, the detectable effect from this design, although not specified, was likely of a reasonable size. If the sample size had been 64 patients per group, there would have been 80% power to detect the generally accepted clinically significant benchmark of 50% of the SD.10 The fact that no statistically significant result was observed is indicative that likely no clinically meaningful change was missed (although type II errors do occur).

One of the important issues regarding MCID calculation is that the results depend on the anchor used (ie, 4-, 5-, or 7-point scale)71 and, possibly, on patient expectations. The expected change correlating with “somewhat improved” may be different for a surgical versus medical intervention. For example, the MCID on the WOMAC in a TKA population was 23 for pain and 20 for function subscales.54 In a similar study of patients with OA undergoing nonsteroidal anti-inflammatory drug therapy, MCID estimates were 20 for WOMAC pain and 9 for WOMAC function.72 This example of different MCID estimates on WOMAC function scales may be attributable to differences in the patient population, baseline pain level, patient expectation, or the anchor used.

MCID estimates may be different in patients with revision versus primary arthroplasty for multiple reasons. Patients undergoing revision arthroplasty may have different pain severity and a higher likelihood of persistent pain and functional limitation postrevision. The largest improvements in HRQOL are seen in primary arthroplasty, and revision may be viewed more as a procedure to maintain the HRQOL, with less capability for symptomatic relief. Some revision surgeries (eg, intervening earlier in the course of osteolysis around an implant associated with prosthetic wear) may be performed not only to address symptoms but to prevent worse problems that may be more complex to manage in the future. In these cases, MCID or even HRQOL scores may not be the most relevant outcomes.

The use of MCID estimates facilitates the interpretation of normative data and baseline status in evaluating the health status of various populations of patients undergoing THA and TKA. There is a critical need to understand THA and TKA in regard to populations with differing aggregate health status. For example, a THA may be performed in a golfer who is having problems getting around the course, or in a patient from an underserved community who is on the verge of requiring a wheelchair. The baseline health status of these two persons may be quite different, and the outcome may be better in the golfer. The change in clinical status, however, may be greater in the borderline wheelchair patient; and the improvement in independence and mobility, as well as the cost savings for the health care system, may make this procedure more cost-effective. The risks of operating on the severely disabled patient may be greater, but so is the possible benefit.

A somewhat related concept is presenting the data regarding the proportion of patients who achieved previously described clinical end points (ie, responder analysis). For example, for the HHS, the definitions of excellent (90–100), good (80–89), fair (70–79), and poor (<70) outcomes have been described. In a study by Kim and Kim,65 the proportion of patients in these categories was reported. In a study by Diduch et al70 that followed 114 TKAs in young, active patients long-term, the authors reported that the mean KSS for function was 89 and that 94% of knees had good or excellent function. While recognizing the caveat that this particular categorization in HHS is somewhat arbitrary, we recommend that authors consider reporting the proportion of patients in the poor or fair category at baseline who shifted to a better or worse category at follow-up. In any event, responder analysis has the advantage of appearing similar to reports on other clinical variables such as treatment response or disease progression.

We recommend that more studies be done to derive MCID estimates for commonly used HRQOL scales in arthroplasty. With knowledge of MCID on these scales, the clinical significance of results for trials of surgical and nonsurgical interventions in arthroplasty patients can be interpreted in addition to simple statistical significance. At present, MCID estimates are known only for the WOMAC and SF-36, so studies comparing the proportion of patients achieving MCID with one treatment versus another may prefer these instruments.

Interpreting Results From Multiple Scales

One of the challenges of HRQOL assessments is that each has its own range and interpretation calibration. For example, Table 2 presents scores for two HRQOL assessments drawn from an arthroplasty study.21 If one looks only at the raw scores, one would conclude that the scores were highly variable across the different assessments. A commonly used approach to solve this problem is to translate each scale onto a continuum of 0 to 100 for ease of interpretation. Once the transformations have been made, however, it is clear that all assessments are reporting similar levels of QOL (roughly 59% to 63% of the theoretic range of the assessment). One may also use T-scores or Z-scores, but nonstatisticians typically find these values difficult to interpret. T-score is the measurement expressed in standard deviation units from a given mean score in a sample, given that the population standard deviation is unknown. A Z-score is the same as the T-score except that the population standard deviation is known.

Table 2
Comparison of Raw and Transformed Scores on Two Health-related Quality of Life Assessments21

Both the World Health Organization Quality of Life assessment short version (WHOQOL-BREF) and the WOMAC, for example, use this transformation onto a 0-to-100 scale to improve interpretability, although for the WHOQOL-BREF, a score of 100 indicates best QOL, whereas for the WOMAC, a score of 100 indicates worst QOL.12,61

We recommend transformation to 0-to-100-point scales in which 100 consistently indicates the best possible outcome for ease of interpretation and comparability across outcome measures. The transformation should be clearly described in both the abstract and methods sections of the paper. For established scales for which precise normative estimates are available, one might also include summary statistics in tables for the raw scores so that researchers who do not report transformed scores are still able to make cross-study comparisons.


Assessing outcomes following total joint arthroplasty can be described as asking the question, “What constitutes a good result?” For example, is a good outcome an absolute final measurement of HRQOL or a benefit that is described by a change in clinical status that correlates well with the patient’s satisfaction? Although clearly more research is needed in this area to understand the hierarchy of outcomes from a patient perspective, we speculate that it is a combination of patients’ overall assessment of change, the present state (ie, pain, function, range of motion in the index joint), adverse effects/complications, and the economic and social burden associated with the surgery and rehabilitation. In essence, this is a reflection of the evolution unfolding in modern medicine whereby the patient is viewed as more than merely the sum of disease indicators.73

HRQOL assessment in arthroplasty clinical research is in many ways advanced relative to the incorporation of PROs in other medical disciplines. Usable assessment tools already exist specifically for this patient population (eg, WOMAC, HHS, KSS), and example studies of successful application of generic assessments are in use (eg, SF-36, SF-12). MCID estimates have already been derived specifically for arthroplasty populations and are beginning to be applied in the interpretation of clinical research studies. In theory, the potential impact on patient HRQOL of arthroplasty is in itself profound; the typical effect sizes one would expect for successful treatments are large, well beyond the MCID, especially in terms of pain and physical function. This has led to a record of successful arthroplasty studies in terms of the ability for profound effect sizes to be observed in terms of HRQOL changes. There is room for improvement in the development of a standard approach to HRQOL assessment, analysis, and interpretation.

The future for HRQOL assessment related to arthroplasty may lie in the application of the successful experiences in clinical research to be translated into clinical practice. Ultimately, the test of the feasibility of HRQOL assessment will be the applicability of collecting such information in clinical practice routinely and incorporating it to improve clinical care. Use of outcomes assessment in physician offices is becoming feasible with the advent of computerized touch-screen technology. This is the key to data collection on a large scale, which is the goal of total joint registries.


Grant support was received from the NIH CTSA Award 1 KL2 RR024151-01 (Mayo Clinic Center for Clinical and Translational Research), North Central Cancer Treatment Group (CA25224-27), and Cancer Center grant CA 15083-32.


Dr. Singh or an immediate family member has received research or institutional support from the Mayo Clinic Center for Clinical and Translational Research, the North Center Cancer Treatment Group, and the Cancer Center. Dr. Sloan or an immediate family member has received research or institutional support from the Mayo Clinic Center for Clinical and Translational Research, the North Center Cancer Treatment Group, and the Cancer Center. Dr. Johanson or an immediate family member has received royalties from Exactech, serves as a paid consultant to or is an employee of Stelkast, and has received research or institutional support from DePuy, Exactech, IsoTis Orthobiologics, Zimmer, the Mayo Clinic Center for Clinical and Translational Research, the North Center Cancer Treatment Group, and the Cancer Center.


Evidence-based Medicine: Levels of evidence are described in the table of contents. In this article, references 9, 64, and 69 are level I studies. References 1, 2, 5, 1122, 2426, 28, 30, 31, 3438, 4263, 6668, 70, and 72 are level II studies. References 23, 27, 39, and 65 are level III studies. References 3, 4, 6, 7, 10, 29, 32, 71, and 73 are level V expert opinion.

Citation numbers printed in bold type indicate references published within the past 5 years.

1. Ethgen O, Bruyère O, Richy F, Dardennes C, Reginster JY. Health-related quality of life in total hip and total knee arthroplasty: A qualitative and systematic review of the literature. J Bone Joint Surg Am. 2004;86:963–974. [PubMed]
2. Kane RL, Saleh KJ, Wilt TJ, Bershadsky B. The functional outcomes of total knee arthroplasty. J Bone Joint Surg Am. 2005;87:1719–1724. [PubMed]
3. Sloan J. Applying QOL assessments: Solutions for oncology clinical practice and research. Part 1. Curr Probl Cancer. 2005;29:271–351.
4. Sloan JA, Cella D, Frost M, et al. Assessing clinical significance in measuring oncology patient quality of life: Introduction to the symposium, content overview, and definition of terms. Mayo Clin Proc. 2002;77:367–370. [PubMed]
5. Huschka MM, Mandrekar SJ, Schaefer PL, Jett JR, Sloan JA. A pooled analysis of quality of life measures and adverse events data in north central cancer treatment group lung cancer clinical trials. Cancer. 2007;109:787–795. [PubMed]
6. Sloan JA, Frost MH, Berzon R, et al. The clinical significance of quality of life assessments in oncology: A summary for clinicians. Support Care Cancer. 2006;14:988–998. [PubMed]
7. Sloan JA. Assessing the minimally clinically significant difference: Scientific considerations, challenges and solutions. COPD. 2005;2:57–62. [PubMed]
8. Salant P, Dillman D. How to Conduct Your Own Survey. New York, NY: John Wiley & Sons; 1994.
9. Kalantar JS, Talley NJ. The effects of lottery incentive and length of questionnaire on health survey response rates: A randomized study. J Clin Epidemiol. 1999;52:1117–1122. [PubMed]
10. Sloan JA, Dueck A. Issues for statisticians in conducting analyses and translating results for quality of life end points in clinical trials. J Biopharm Stat. 2004;14:73–96. [PubMed]
11. Whitehouse SL, Lingard EA, Katz JN, Learmonth ID. Development and testing of a reduced WOMAC function scale. J Bone Joint Surg Br. 2003;85:706–711. [PubMed]
12. Bellamy N, Buchanan WW, Goldsmith CH, Campbell J, Stitt LW. Validation study of WOMAC: A health status instrument for measuring clinically important patient relevant outcomes to antirheumatic drug therapy in patients with osteoarthritis of the hip or knee. J Rheumatol. 1988;15:1833–1840. [PubMed]
13. Yang KG, Raijmakers NJ, Verbout AJ, Dhert WJ, Saris DB. Validation of the short-form WOMAC function scale for the evaluation of osteoarthritis of the knee. J Bone Joint Surg Br. 2007;89:50–56. [PubMed]
14. Tubach F, Baron G, Falissard B, et al. Using patients’ and rheumatologists’ opinions to specify a short form of the WOMAC function subscale. Ann Rheum Dis. 2005;64:75–79. [PMC free article] [PubMed]
15. Ware JE, Jr, Sherbourne CD. The MOS 36-item short-form health survey (SF-36): I. Conceptual framework and item selection. Med Care. 1992;30:473–483. [PubMed]
16. Jenkinson C, Jenkinson D, Shepperd S, Layte R, Petersen S. Evaluation of treatment for congestive heart failure in patients aged 60 years and older using generic measures of health status (SF-36 and COOP charts) Age Ageing. 1997;26:7–13. [PubMed]
17. Jenkinson C, Layte R, Lawrence K. Development and testing of the Medical Outcomes Study 36-Item Short Form Health Survey summary scale scores in the United Kingdom: Results from a large-scale survey and a clinical trial. Med Care. 1997;35:410–416. [PubMed]
18. Blanchard CM, Côté I, Feeny D. Comparing short form and RAND physical and mental health summary scores: Results from total hip arthroplasty and high-risk primary-care patients. Int J Technol Assess Health Care. 2004;20:230–235. [PubMed]
19. Dunbar MJ, Robertsson O, Ryd L, Lidgren L. Translation and validation of the Oxford-12 item knee score for use in Sweden. Acta Orthop Scand. 2000;71:268–274. [PubMed]
20. Dunbar MJ, Robertsson O, Ryd L, Lidgren L. Appropriate questionnaires for knee arthroplasty: Results of a survey of 3600 patients from The Swedish Knee Arthroplasty Registry. J Bone Joint Surg Br. 2001;83:339–344. [PubMed]
21. Ostendorf M, van Stel HF, Buskens E, et al. Patient-reported outcome in total hip replacement: A comparison of five instruments of health status. J Bone Joint Surg Br. 2004;86:801–808. [PubMed]
22. Bergh I, Jakobsson E, Sjöström B, Steen B. Ways of talking about experiences of pain among older patients following orthopaedic surgery. J Adv Nurs. 2005;52:351–361. [PubMed]
23. Bozic KJ, Rubash HE. The painful total hip replacement. Clin Orthop Relat Res. 2004;420:18–25. [PubMed]
24. Byrne M, Troy A, Bradley LA, et al. Cross-validation of the factor structure of the McGill Pain Questionnaire. Pain. 1982;13:193–201. [PubMed]
25. Melzack R. The short-form McGill Pain Questionnaire. Pain. 1987;30:191–197. [PubMed]
26. Daut RL, Cleeland CS, Flanery RC. Development of the Wisconsin Brief Pain Questionnaire to assess pain in cancer and other diseases. Pain. 1983;17:197–210. [PubMed]
27. Faller H, Kirschner S, König A. Psychological distress predicts functional outcomes at three and twelve months after total knee arthroplasty. Gen Hosp Psychiatry. 2003;25:372–373. [PubMed]
28. Brander VA, Stulberg SD, Adams AD, et al. Predicting total knee replacement pain: A prospective, observational study. Clin Orthop Relat Res. 2003;416:27–36. [PubMed]
29. Insall JN, Dorr LD, Scott RD, Scott WN. Rationale of the Knee Society clinical rating system. Clin Orthop. 1989;248:13–14. [PubMed]
30. Lingard EA, Katz JN, Wright RJ, Wright EA, Sledge CB. Kinemax Outcomes Group: Validity and responsiveness of the Knee Society Clinical Rating System in comparison with the SF-36 and WOMAC. J Bone Joint Surg Am. 2001;83:1856–1864. [PubMed]
31. Liow RY, Walker K, Wajid MA, Bedi G, Lennox CM. Functional rating for knee arthroplasty: Comparison of three scoring systems. Orthopedics. 2003;26:143–149. [PubMed]
32. Kreibich DN, Vaz M, Bourne RB, et al. What is the best way of assessing outcome after total knee replacement? Clin Orthop Relat Res. 1996;331:221–225. [PubMed]
33. Alicea J. Scoring systems and their validation for the arthritic knee. In: Insall JN, Churchill SN, editors. Surgery of the Knee. 3. New York, NY: Livingston; 2001. pp. 1507–1515.
34. Gore DR, Murray MP, Sepic SB, Gardner GM. Correlations between objective measures of function and a clinical knee rating scale following total knee replacement. Orthopedics. 1986;9:1363–1367. [PubMed]
35. Dawson J, Fitzpatrick R, Murray D, Carr A. Questionnaire on the perceptions of patients about total knee replacement. J Bone Joint Surg Br. 1998;80:63–69. [PubMed]
36. Roos EM, Roos HP, Lohmander LS, Ekdahl C, Beynnon BD. Knee Injury and Osteoarthritis Outcome Score (KOOS): Development of a self-administered outcome measure. J Orthop Sports Phys Ther. 1998;28:88–96. [PubMed]
37. Roos EM, Roos HP, Ekdahl C, Lohmander LS. Knee injury and Osteoarthritis Outcome Score (KOOS): Validation of a Swedish version. Scand J Med Sci Sports. 1998;8:439–448. [PubMed]
38. Roos EM, Toksvig-Larsen S. Knee injury and Osteoarthritis Outcome Score (KOOS): Validation and comparison to the WOMAC in total knee replacement. Health Qual Life Outcomes. 2003;1:17. [PMC free article] [PubMed]
39. Harris WH. Traumatic arthritis of the hip after dislocation and acetabular fractures: Treatment by mold arthroplasty. An end-result study using a new method of result evaluation. J Bone Joint Surg Am. 1969;51:737–755. [PubMed]
40. Wright JG, Young NL. A comparison of different indices of responsiveness. J Clin Epidemiol. 1997;50:239–246. [PubMed]
41. McGrory BJ, Freiberg AA, Shinar AA, Harris WH. Correlation of measured range of hip motion following total hip arthroplasty and responses to a questionnaire. J Arthroplasty. 1996;11:565–571. [PubMed]
42. Söderman P, Malchau H. Validity and reliability of Swedish WOMAC osteoarthritis index: A self-administered disease-specific questionnaire (WOMAC) versus generic instruments (SF-36 and NHP) Acta Orthop Scand. 2000;71:39–46. [PubMed]
43. Hoeksma HL, Van Den Ende CH, Ronday HK, Heering A, Breedveld FC. Comparison of the responsiveness of the Harris Hip Score with generic measures for hip function in osteoarthritis of the hip. Ann Rheum Dis. 2003;62:935–938. [PMC free article] [PubMed]
44. Dawson J, Fitzpatrick R, Carr A, Murray D. Questionnaire on the perceptions of patients about total hip replacement. J Bone Joint Surg Br. 1996;78:185–190. [PubMed]
45. Dawson J, Fitzpatrick R, Frost S, Gundle R, McLardy-Smith P, Murray D. Evidence for the validity of a patient-based instrument for assessment of outcome after revision hip replacement. J Bone Joint Surg Br. 2001;83:1125–1129. [PubMed]
46. de Groot IB, Reijman M, Terwee CB, et al. Validation of the Dutch version of the Hip disability and Osteoarthritis Outcome Score. Osteoarthritis Cartilage. 2007;15:104–109. [PubMed]
47. Klässbo M, Larsson E, Mannevik E. Hip disability and osteoarthritis outcome score: An extension of the Western Ontario and McMaster Universities Osteoarthritis Index. Scand J Rheumatol. 2003;32:46–51. [PubMed]
48. Nilsdotter AK, Lohmander LS, Klässbo M, Roos EM. Hip disability and osteoarthritis outcome score (HOOS): Validity and responsiveness in total hip replacement. BMC Musculoskelet Disord. 2003;4:10. [PMC free article] [PubMed]
49. Bombardier C, Melfi CA, Paul J, et al. Comparison of a generic and a disease-specific measure of pain and physical function after knee replacement surgery. Med Care. 1995;33(4 suppl):AS131–AS144. [PubMed]
50. Boardman DL, Dorey F, Thomas BJ, Lieberman JR. The accuracy of assessing total hip arthroplasty outcomes: A prospective correlation study of walking ability and 2 validated measurement devices. J Arthroplasty. 2000;15:200–204. [PubMed]
51. Fortin PR, Clarke AE, Joseph L, et al. Outcomes of total hip and knee replacement: Preoperative functional status predicts outcomes at six months after surgery. Arthritis Rheum. 1999;42:1722–1728. [PubMed]
52. Laupacis A, Bourne R, Rorabeck C, et al. The effect of elective total hip replacement on health-related quality of life. J Bone Joint Surg Am. 1993;75:1619–1626. [PubMed]
53. Heck DA, Robinson RL, Partridge CM, Lubitz RM, Freund DA. Patient outcomes after knee replacement. Clin Orthop Relat Res. 1998;356:93–110. [PubMed]
54. Escobar A, Quintana JM, Bilbao A, Aróstegui I, Lafuente I, Vidaurreta I. Responsiveness and clinically important differences for the WOMAC and SF-36 after total knee replacement. Osteoarthritis Cartilage. 2007;15:273–280. [PubMed]
55. Quintana JM, Escobar A, Bilbao A, Arostegui I, Lafuente I, Vidaurreta I. Responsiveness and clinically important differences for the WOMAC and SF-36 after hip joint replacement. Osteoarthritis Cartilage. 2005;13:1076–1083. [PubMed]
56. Johanson NA, Liang MH, Daltroy L, Rudicel S, Richmond J. American Academy of Orthopaedic Surgeons lower limb outcomes assessment instruments: Reliability, validity, and sensitivity to change. J Bone Joint Surg Am. 2004;86:902–909. [PubMed]
57. McHorney CA, Ware JE, Jr, Raczek AE. The MOS 36-Item Short-Form Health Survey (SF-36): II. Psychometric and clinical tests of validity in measuring physical and mental health constructs. Med Care. 1993;31:247–263. [PubMed]
58. McHorney CA, Ware JE, Jr, Lu JF, Sher-bourne CD. The MOS 36-item Short-Form Health Survey (SF-36): III. Tests of data quality, scaling assumptions, and reliability across diverse patient groups. Med Care. 1994;32:40–66. [PubMed]
59. Nilsdotter AK, Roos EM, Westerlund JP, Roos HP, Lohmander LS. Comparative responsiveness of measures of pain and function after total hip replacement. Arthritis Rheum. 2001;45:258–262. [PubMed]
60. Ware J, Jr, Kosinski M, Keller SD. A 12-Item Short-Form Health Survey: Construction of scales and preliminary tests of reliability and validity. Med Care. 1996;34:220–233. [PubMed]
61. Ackerman IN, Graves SE, Bennell KL, Osborne RH. Evaluating quality of life in hip and knee replacement: Psychometric properties of the World Health Organization Quality of Life short version instrument. Arthritis Rheum. 2006;55:583–590. [PubMed]
62. Riddle DL, Stratford PW, Bowman DH. Findings of extensive variation in the types of outcome measures used in hip and knee replacement clinical trials: A systematic review. Arthritis Rheum. 2008;59:876–883. [PubMed]
63. Fitzgerald JD, Orav EJ, Lee TH, et al. Patient quality of life during the 12 months following joint replacement surgery. Arthritis Rheum. 2004;51:100–109. [PubMed]
64. Lingard EA, Katz JN, Wright EA, Sledge CB. Kinemax Outcomes Group: Predicting the outcome of total knee arthroplasty. J Bone Joint Surg Am. 2004;86:2179–2186. [PubMed]
65. Kim YH, Kim VE. Uncemented porous-coated anatomic total hip replacement: Results at six years in a consecutive series. J Bone Joint Surg Br. 1993;75:6–13. [PubMed]
66. Hawker G, Melfi C, Paul J, Green R, Bombardier C. Comparison of a generic (SF-36) and a disease specific (WOMAC) (Western Ontario and McMaster Universities Osteoarthritis Index) instrument in the measurement of outcomes after knee replacement surgery. J Rheumatol. 1995;22:1193–1196. [PubMed]
67. Jaeschke R, Singer J, Guyatt GH. Measurement of health status: Ascertaining the minimal clinically important difference. Control Clin Trials. 1989;10:407–415. [PubMed]
68. Quintana JM, Aróstegui I, Azkarate J, et al. Evaluation by explicit criteria of the use of total hip joint replacement. Rheumatology (Oxford) 2000;39:1234–1241. [PubMed]
69. Barrack RL, Bertot AJ, Wolfe MW, Waldman DA, Milicic M, Myers L. Patellar resurfacing in total knee arthroplasty: A prospective, randomized, double-blind study with five to seven years of follow-up. J Bone Joint Surg Am. 2001;83:1376–1381. [PubMed]
70. Diduch DR, Insall JN, Scott WN, Scuderi GR, Font-Rodriguez D. Total knee replacement in young, active patients: Long-term follow-up and functional outcome. J Bone Joint Surg Am. 1997;79:575–582. [PubMed]
71. Revicki D, Hays RD, Cella D, Sloan J. Recommended methods for determining responsiveness and minimally important differences for patient-reported outcomes. J Clin Epidemiol. 2008;61:102–109. [PubMed]
72. Tubach F, Ravaud P, Baron G, et al. Evaluation of clinically relevant changes in patient reported outcomes in knee and hip osteoarthritis: The minimal clinically important improvement. Ann Rheum Dis. 2005;64:29–33. [PMC free article] [PubMed]
73. Tannock IF. Treating the patient, not just the cancer. N Engl J Med. 1987;317:1534–1535. [PubMed]