|Home | About | Journals | Submit | Contact Us | Français|
Experts disagree about the optimal classification of upper limb disorders (ULDs). To explore whether differential response to treatments offers a basis for choosing between case definitions, we analysed previously published research.
We screened 183 randomised controlled trials (RCTs) of treatments for ULDs, identified from the bibliographies of 10 Cochrane reviews, four other systematic reviews, and a search in Medline, Embase, and Google Scholar to June 2010. From these, we selected RCTs which allowed estimates of benefit (expressed as relative risks (RRs)) for >1 case definition to be compared when other variables (treatment, comparison group, follow-up time, outcome measure) were effectively held constant. Comparisons of RRs for paired case definitions were summarised by their ratios, with the RR for the simpler and broader definition as the denominator.
Two RCT reports allowed within-trial comparison of RRs and thirteen others allowed between-trial comparisons. Together these provided 17 ratios of RRs (five for shoulder treatments, 12 for elbow treatments, none for wrist/hand treatments). The median ratio of RRs was 1.0 (range 0.3 to 1.7; interquartile range 0.6 to 1.3).
Although the evidence base is limited, our findings suggest that for musculoskeletal disorders of the shoulder and elbow, clinicians in primary care will often do best to apply simpler and broader case definitions. Researchers should routinely publish secondary analyses for subgroups of patients by different diagnostic features at trial entry, to expand the evidence base on optimal case definitions for patient management.
Soft tissue rheumatic disorders of the upper limb (ULDs) are important causes of morbidity and sickness absence. In Great Britain, for example, they were responsible for an estimated annual loss of 3.64 million working days in 2009/10 . However, their optimal classification remains elusive [2-5]. Controversy arises because of the multiplicity of disease labels and diagnostic criteria adopted both in clinical practice and in research, frequent ambiguity in the coverage and boundaries of case definitions, and a general lack of well accepted reference standards .
Disagreement over diagnostic classification hinders the interpretation and effective pooling of research data and contributes to variations in clinical practice, with the possibility of sub-optimal care. To address the problem, several classificatory schemes have been proposed following multidisciplinary workshops and consultations with informed experts [7-13]. However, among 27 such schemes identified by review in 2003, no two were identical .
In the face of widespread continuing disagreement, we have proposed an approach to optimising case definition based on a conception of “diagnosis” as being a useful way of classifying patients to determine follow-on actions . Useful case definitions, we have argued, are those which enable more effective choice and targeting of treatments, the delivery of more accurate prognostic advice to patients, or better identification of modifiable causes of illness. Case definitions that “add value” will be those which distinguish groups of patients who show importantly different responses to treatment, clinical outcomes, or associations with putative risk factors.
Where a diagnostic distinction can be drawn between groups of patients who differ substantially in their response to specified treatments, this will enable clinicians to deliver therapy more effectively to those who can benefit. If, however, the distinction is not drawn, and patients are all classified to a single broader diagnostic category, inappropriate treatments may be given to some patients who will not benefit. Conversely, if treatment effects are similar for alternative case definitions, and none of the definitions improves the targeting of treatment, then combining them in a single diagnostic category has the merit of simplicity, and may avoid wasteful, unpleasant or even hazardous clinical investigation.
The relative merits of diagnostic “lumping” or “splitting” can be assessed by research. In particular, if the benefits of a treatment are restricted to a specific subset of patients within a broader diagnostic category, a clinical trial which uses the broader case definition will tend to give smaller estimates of effect than one which focuses only on the subset who can benefit. This is because the impact of treatment in those who can benefit will be diluted by the absence of effect in the remainder. Thus, evidence for or against broader as compared with tighter case definitions can be sought by comparing estimates of benefit in trials that apply the same (or a very similar) intervention to groups of patients defined according to alternative diagnostic criteria. Tighter case definition would be supported if it led to greater estimates of benefit in trials than broader case definition. On the other hand, if estimates of treatment effect were no higher for a more specific as compared with a broader case definition, then, at least as a guide to use of that treatment, the broader case definition would be preferable.
To explore the relative merits of alternative diagnostic schemes in the treatment of ULDs in community settings (and especially primary care), we undertook a systematic literature review in which we compared the impacts of the same therapeutic interventions on groups of patients identified according to different case definitions.
Initially we screened the Cochrane Library database (http://www.thecochranelibrary.com) for randomised controlled trial (RCT) data on treatments for ULDs likely to be administered by, or to influence the referral practices of primary care practitioners. We found Cochrane reports on four categories of treatment at the shoulder (steroid injections, acupuncture, physiotherapy, surgery vs. not surgery) [15-18], four at the elbow (acupuncture, physiotherapy, surgery vs. not surgery, shockwave therapy) [19-21], and three at wrist/hand (steroid injections, surgery vs. not surgery, other non-surgical interventions) [22-24]. We listed all the primary research reports referenced in these reviews.
Additionally, as the searches underlying the Cochrane reviews were 2-8 years old and Cochrane reviews on certain treatments of interest (steroid injections and physiotherapy at the elbow) were planned but as yet unpublished, we updated all Cochrane searches to 30th June 2010 and identified other non-Cochrane reviews on the two elbow treatments of interest. For these purposes, we searched in Medline and Embase (from the last relevant Cochrane search date to June 2010) and in Google Scholar (first 300 hits in March 2010), combining terms for ULD with those for randomised controlled trials (RCTs). After elimination of duplicate references, these searches yielded 156 potentially relevant papers from Cochrane reviews and 189 other potentially relevant titles (including several that were cited in four non-Cochrane reviews (on steroid injections [25-27] and physiotherapy for epicondylitis ) identified by the search.
We next excluded studies which: (1) lacked any ULD case definition; (2) did not allow analysis by anatomical site, or did not allow ULDs to be distinguished from neck problems; (3) did not report an outcome involving change in upper limb pain or function (or give enough information to quantify such an outcome); (4) had fewer than 10 subjects in a treatment arm (these were deemed to be at high potential for publication bias); (5) involved only select patient groups (e.g. stroke patients or victims of external trauma); (6) involved only a comparison between alternative surgical techniques or a choice of alternative oral drug therapies; (7) were not in English.
By these criteria, 160 (25 Cochrane-cited and 135 non-Cochrane cited) trial reports could be excluded by a review of their abstracts, while the remaining 185 RCT reports were retrieved for full inspection. Table 1 lists the interventions considered by this report, the Cochrane and other reviews used as primary sources for the search (dates of currency and yield of papers), and the numbers of papers by source that were retrieved for full inspection.
Retrieved reports were further screened to determine whether they supplied within-study estimates of treatment effect for more than one case definition (i.e. whether analyses were presented for subgroups of cases). Then, from the remainder, we excluded studies that were insufficiently similar in their main features ((i) anatomical site, (ii) intervention(s), (iii) comparison group, (iv) follow-up time, (v) outcome measure) to allow between-study comparison. Failure to match a study with at least one other study according to these criteria meant that informative comparison of effect sizes by case definitions could not be made.
In analysing responses to treatment we restricted attention to those that could be expressed as the prevalence of a binary outcome (e.g. cured/improved vs. not). We chose to omit continuous measures (e.g. change in mean pain score), since studies may have differed in their initial baseline distribution of values, leading to non-comparable potential for change. Decisions about the adequacy of matching and other exclusions were agreed in consensus by two of us (CHL and KTP).
For the finally included studies, note was made of the case definitions employed; and for all binary outcomes shared in common, the effect sizes (relative risks (RR)) with estimates of precision (95% confidence intervals (95%CI)) were recorded. Where RRs or their 95%CIs were not presented in the published reports, we estimated them from the numbers in each exposure-outcome combination, using the immediate command in STATA version 11 (StataCorp. 2009. Stata Statistical Software: Release 11. College Station, TX: StataCorp LP).
Comparisons of RR between paired case definitions were summarised by their ratios – taking the RR for the more complex and specific case definition (e.g. shoulder symptoms with signs) as the numerator and that for the simpler and broader case definition (e.g. shoulder symptoms) as the denominator. Where two definitions were equally complex, the numerator RR was chosen arbitrarily. 95% CIs for ratios of RRs were computed as the exponent of the confidence intervals of the difference between the logged RRs . Separate summary statistics were compiled by anatomical site and for each treatment-comparison combination.
Finally, to explore the possibility that the ratio of RRs might be biased towards unity where a treatment was ineffective (since RRs for both definitions and their ratio would all tend towards unity), in sensitivity analysis we restricted focus to those comparisons in which at least one of the paired RRs was ≥2 or ≤0.5.
Our search yielded only one study (two reports [30,31]) which enabled a within-trial comparison of case definitions, as the only papers reporting sub-analyses for alternative case definitions for treatments in common. Among the remaining 181 RCT reports retrieved for inspection, meaningful comparison between trials was possible only for studies of reasonable size employing comparable interventions and with similar comparison groups, follow-up times, and measures of outcome. Seventy-six reports were excluded because there was no other study with a comparable intervention and comparison group, eight because there was none with similar follow-up time, and 59 because no other study had a comparable outcome measure. A further 17 RCTs involved fewer than 10 subjects in a treatment arm, and eight lacked any case definition at all; these were likewise excluded.
After all exclusions, 15 RCT reports were retained (including the two providing within-trial comparisons). Five reports related to treatments at the shoulder [30-34] and 10 at the elbow [35-44]. No reports of interventions at the wrist/hand met all of the specified inclusion criteria. Information on the settings of these trials, the sources from which patients were recruited, and the inclusion and exclusion criteria that were applied, is provided in a supplementary web table.
Tables 2 and and33 compare estimates of treatment effect by intervention and case definition at the shoulder and elbow respectively. For the most part it was possible to designate one paired definition as more complex and specific than the other, and in only one comparison was the choice of the denominator RR arbitrary. Relative to the simpler and broader definition, three ratios of RRs were available from within-trial comparisons at the shoulder and two from between-trial comparisons (Table 2). The former ranged from 1.3 to 1.7, the latter from 0.5 to 1.2. At the elbow (Table 3), 12 ratios of RRs were available (all between-trial comparisons), covering various treatments and follow-up intervals. Most ratios (8 of the 12) were <1.0, with a range from 0.3 to 1.7. The median ratio of RRs across all 17 observations at the shoulder and elbow combined was 1.0 (interquartile range (IQR) 0.6 to 1.3). Only two of the 17 ratios were significantly different from 1.0 at the 5% level, in both cases, the greater treatment effect being for the simpler and broader of the case definitions compared.
As mentioned above, a lack of detectable difference between case definitions might arise if the treatment in question was ineffective. However, when analysis was restricted to ratios of RRs in which at least one of the RRs from the contributing paired trials was ≥2.0 [30,34-37,43] (none had an RR<0.5), the ratios of RRs were scarcely different (median ratio 1.0, IQR 0.6 to 1.7). Similarly, although the studies in Tables 2 and and33 suggest the possibility of short-rather than long-term benefits for various treatment combinations (steroid injections, physiotherapy and manipulation at the shoulder [30-33] or elbow [35-37]), restricting analysis to outcomes during the first eight weeks of follow-up did not result in higher ratios of RRs (nine ratios overall, median 0.6, IQR 0.6 to 1.2).
None of the more complex case definitions explored in this analysis appeared to be associated with a markedly better response to treatment than the simpler alternative with which it was compared. Thus, case definitions which were more complex, resource-intensive, and restrictive did not seem to give "added value".
Interpretation of these findings must take into account the limitations of our method of investigation. Papers were selected blind to information on the ratios of interest, and we have no reason to suppose that the reports retrieved were unrepresentative of the universe of such studies in the peer-reviewed literature. However, our search may not have identified every RCT of treatment for ULDs with findings for more than a single case definition. Also, it proved difficult – despite a detailed search based around and supplementing the Cochrane database – to find trials of sufficient similarity to support inferences about differential effects of treatment according to case definition. Thus, the material on which comparisons could be drawn was relatively sparse, with only 17 ratios of RRs, limiting the scope to draw strong conclusions. Within-trial analyses allowed simpler more direct interpretation, but only one RCT presented sub-analyses for alternative case definitions. Nor did the available between-trial comparisons permit assessment of a wider contrast of case definitions than “symptoms and signs” vs. “more elaborate symptoms and signs” (none for example, allowed comparison of definitions that used information from diagnostic tests such as ultrasound, MRI, or nerve conduction velocity, with others that did not). Finally, it should be noted that the studies we could compare may have differed not only in the case definitions applied, but also in the case mix of patients to whom those criteria were applied (although we have no reason to expect that differences in case mix and referral practice would systematically favour outcomes for either less specific or more specific case definitions, and thereby bias our findings).
A further potential limitation is that we focused on RCTs, as these offered the best prospect of obtaining unbiased estimates of treatment effect in the studies compared by case definition. Trial participants often differ from unselected patients in routine clinical practice. However, this would only invalidate our main finding if the entry criteria for compared studies differed systematically such as to consistently bias estimates of the ratios of RRs upwards or downwards, which seems unlikely. Finally, in secondary research such as this, analysis can be based only on the published data that are available and cannot exclude the possibility that findings might be different for other treatments, whose efficacy has not to date been investigated for different case groups using similar outcome measures.
The information that is available from the studies we compared does not indicate important differences in responses to the same treatment according to case definition for disorders at either the shoulder or elbow. However, it remains possible that further research would reveal such differences. To address current gaps in understanding, we recommend that, in future, researchers conducting treatment trials should adopt broader case definitions where they seem reasonable, and then as a routine, carry out and publish secondary analyses for subgroups of patients according to different diagnostic features within the specified criteria for entry to the study. Many journals offer facilities to publish supplementary analyses of this type on their websites, and access to such data would greatly expand the evidence base and provide clearer pointers to optimal case definitions for patient management.
Meanwhile, the RRs in Tables 2 and and33 suggest benefits from specific treatments for certain of the broader case definitions examined. Thus, when deciding on treatment and/or referral for patients in primary care with shoulder pain, there may be value in identifying cases who have limitation of glenohumeral movement in at least one direction [30,32,34], while among patients with elbow pain, it may be worth identifying those who also have tenderness at the elbow [35,38,43]. Beyond this, however, the data on ratios of RRs provide little evidence to support more elaborate diagnostic classification of shoulder and elbow disorders in primary care.
An exception could occur where effective treatments are predicated on the presence of specific clinical features. Thus, for example, one RCT found improved outcomes for shoulder disorders with steroid injection at sites which were determined by more complex combinations of symptoms and physical signs . However, this need for diagnostic refinement is likely to be limited, since the most recent Cochrane reviews have concluded that steroid injections, ultrasound, and acupuncture are of limited or uncertain benefit at the shoulder [15-17]; and that acupuncture is of unproven benefit  and shockwave therapy ineffective  for lateral elbow pain. On the other hand, recent non-Cochrane reviews do favour steroid injections over placebo as a treatment for lateral elbow pain with tenderness [25-27], further supporting the case for distinguishing patients in whom elbow pain is accompanied by tenderness.
This study was supported by a grant from the UK Health and Safety Executive, with the aim of improving consensus over case definitions for ULDs. A full web-published grant report, of which this abridged summary forms a part, was kindly commented on by the following experts: Dwayne Van Eerd and Dorcas Beaton, Sigurd Mikkelsen, Johan Hviid Andersen, Alexis Descatha, Eira Viikari Juntura, Alex Burdorf and Jolanda Luime, Mats Hagberg, David Rempel, and Barbara Silverstein.
Funding: This study was supported by a grant from the UK Health and Safety Executive.
Conflicts of interest: None to declare.
Ethics approval: No requirement (secondary analysis of data in the public domain).