|Home | About | Journals | Submit | Contact Us | Français|
Experts disagree about the optimal classification of upper limb disorders (ULDs). To explore whether differences in associations with occupational risk factors offer a basis for choosing between case definitions in aetiological research and surveillance, we analysed previously published research.
Eligible reports (those with estimates of relative risk (RR) for >1 case definition relative to identical exposures were identified from systematic reviews of ULD and occupation and by hand-searching five peer-review journals published between January 1990 and June 2010. We abstracted details by anatomical site of the case and exposure definitions employed and paired estimates of RR, for alternative case definitions with identical occupational exposures. Pairs of case definitions were typically nested, a stricter definition being a subset of a simpler version. Differences in RR between paired definitions were expressed as the ratio of RRs, using that for the simpler definition as the denominator.
We found 21 reports, yielding 320 pairs of RRs (82, 75 and 163 respectively at the shoulder, elbow, and distal arm). Ratios of RRs were frequently ≤1 (46%), the median ratio overall and by anatomical site being close to unity. In only 2% of comparisons did ratios reach ≥4.
Complex ULD case definitions (e.g. involving physical signs, more specific symptom patterns, and investigations) yield similar associations with occupational risk factors to those using simpler definitions. Thus, in population-based aetiological research and surveillance, simple case definitions should normally suffice. Data on risk factors can justifiably be pooled in meta-analyses, despite differences in case definition.
Musculoskeletal disorders of the upper limb (ULDs) are important causes of morbidity and sickness absence, resulting, for example, in an estimated loss of 3.75 million working days per year in the UK in 2008/9 . However, their optimal classification remains controversial [2-5]. Difficulty arises because of the multiplicity of disorders, disease labels, and diagnostic criteria adopted by therapists and researchers; ambiguity in the coverage and boundaries of case definitions; and the lack of a suitable reference standard .
This want of an agreed diagnostic classification has hindered the pooling and interpretation of research data, and also attempts at standardised occupational surveillance. Additionally, it limits the scope for compensation.
Editorials and commentaries have highlighted the impasse and called on experts to agree consensual case definitions [7,8], and several classificatory schemes have been proposed based upon this approach [9-15]. However, in a systematic review by Van Eerd et al, no two of 27 schemes identified were identical , underscoring the scale of continuing disagreement.
There has been only limited debate about the rationale for preferring one scheme to another. However, a “diagnosis” can be thought of as a means to an end, possible goals being to identify modifiable risk factors as an aid to prevention, and to improve the clinical management of patients. According to this utilitarian logic, good case definitions will be those which ‘add value’ by distinguishing categories of disorder that differ importantly in their associations with potential causes and/or in their prognosis and responses to different treatments. It will be worth distinguishing subgroups (splitting rather than lumping) only where such differences are demonstrable and materially influence follow-on actions .
Optimal case definitions for prevention may not necessarily be identical to those for clinical care. To assess the relative merits of competing case definitions for ULDs in aetiological research, prevention, and surveillance, we compared their utility by analysis of results from previously published research.
We sought peer-reviewed reports in which more than one case definition had been analysed against occupational risk factors that were defined identically.
Several sources were screened to find papers with these characteristics:
We then excluded papers which were: (1) not in English; (2) published before 1980; (3) did not allow upper limb problems to be analysed separately from neck problems; (4) did not distinguish upper limb problems by anatomical site (e.g. elbow as opposed to arm); (5) did not include at least two explicit case definitions; (6) did not quantify risk estimates for one or more exposures in common against these alternative case definitions.
Figure 1 (online supplement) details the number of reports screened, retrieved, excluded and analysed in PRISMA  format. Altogether, 162 reports were screened for eligibility (86 of these in full). Finally, 21 reports (from 16 studies) were included in data synthesis. The main reasons for exclusion were: irrelevant hits (n=76), failure to analyse by anatomical site (n=29), absence of two case definitions (n=15), and insufficient quantification of risks (n=11).
From each eligible report we abstracted the main study characteristics (authors, date, setting, study design, study populations and numbers); and separately for each anatomical site (shoulder, elbow, forearm, wrist/hand), the occupational exposure definitions and case definitions employed; also, we abstracted or derived paired estimates of relative risk (RR – sometimes expressed as an odds ratio or prevalence ratio) with 95% confidence intervals (95% CI), for alternative case definitions with the same occupational exposures. Data abstraction was piloted and then undertaken by two of us (ECH and CL) and checked in full by KTP.
Various categories of comparison were identifiable, most of which had a natural hierarchy. Distinguishable were definitions based on: 1) symptoms vs. symptoms plus signs (e.g. elbow pain vs. elbow pain with tenderness on palpation); 2) symptoms defined generally vs. symptoms defined specifically (e.g. tingling/numbness in the hand vs. nocturnal tingling/numbness in the median nerve distribution); and 3) symptoms/signs vs. symptoms/signs plus a positive investigation (e.g. Tinel’s sign positive vs. this and delayed median nerve conduction). Additionally, 4) one report allowed a comparison with no implicit hierarchy (one examination finding vs. another).
Differences in association between pairs of case definitions were expressed as a ratio of RRs – based, for the hierarchical groupings, on the stricter definition (C2) as numerator divided by the simpler one (C1) as denominator, and for the sole non-hierarchical pairing, on an arbitrary choice of denominator.
As hierarchical comparisons tended to be nested (i.e. subjects fulfilling the stricter case definition, C2, were subsets of those fulfilling the simpler definition, C1, rather than statistically independent samples), or the degree of overlap was uncertain from published reports, standard errors and CIs for their ratios could not be derived by standard statistical methods. Instead, the ratios were categorised into bands by magnitude and summarised by their median and interquartile (IQR) values. To gauge crudely the potential contribution of chance to any differences in RR, a comparison was also made between the lower 95% confidence limit of that for the stricter definition and the central estimate of that for the simpler definition. Separate summary statistics were compiled by anatomical site and for each category of comparison.
A limitation of this method is that the reference groups for the RRs under comparison were not always identical. For example, one RR might be for C1 vs. “not C1” and the other for C2 vs. “not C2”, rather than for C2 vs. “not C1”. Each report was assessed for this potential bias and the size of bias (% by which the ratio changed when analysis was restricted to “not C1” as compared with “not C2”) was estimated for all reports where it was known or calculable from the data supplied. The distribution (median and interquartile range) of estimated biases was then summarised.
The 16 studies (21 reports) [10,18-37] yielded by our search provided a total of 320 paired comparisons of RR. The main characteristics of the reviewed studies are listed in Table 1. Reports came mostly from Northern Europe and the workplace setting; 13 of the 16 studies were cross-sectional in design. Sample sizes ranged from 96 to 6,943, and in six of the studies (11 of the 21 reports), they exceeded 1,000.
Tables 22 to to44 provide, separately for the shoulder, elbow, and distal upper limb, details of the paired case definitions employed in reports; the occupational exposures analysed; the number of comparisons of RRs provided by each report; and the frequency distribution of the ratios of RRs.
Taking, as an example, the findings at the shoulder by Brandt : a comparison was drawn between a simple symptom definition (at least moderate shoulder pain over the past 7 days) and a stricter definition involving these symptoms together with substantial tenderness on palpation in various pre-specified anatomical locations. RRs for each case definition were obtainable for use of the computer mouse and use of the computer keyboard – two pairs of comparisons, the ratio of RRs (stricter vs. simpler definition) being in each case ≤1; in neither did the lower 95% confidence limit for the stricter definition exceed the central estimate of the simpler definition.
On the same basis, some 82 paired comparisons were found at the shoulder, 75 at the elbow, and 163 in the distal arm, with some reports contributing in excess of 40 comparisons. However, ratios of RRs were seldom as high as 4 (2% of all comparisons), and often ≤1 (46%), the median ratio overall and by anatomical site being close to unity (Table 5). In only 5% of comparisons were RRs sufficiently divergent for the lower 95% confidence limit of the stricter case definition to exceed the central estimate of that for the simpler one. Different patterns of hierarchical comparison (whether involving a contrast of symptoms, or symptoms with symptoms and signs, or additional investigation) yielded similar estimates of the ratio, as did comparisons at each anatomical site, although ratios above two were somewhat more common at the shoulder.
In a sensitivity analysis, we excluded data from the influential NUDATA and Finland Health 2000 studies [20-23,36], the linked reports from which contributed all but 116 of the 320 comparisons. Among the remainder, ratios of ≥2 and ≥4 were relatively less frequent, but the median and IQR values were scarcely different.
Finally, Table 6 (which appears online) presents estimates of potential bias arising from use of different reference groups in comparisons of RRs. In nine studies (11 reports) it was either absent or calculable from the published data; in three it was present but not calculable; and in five studies (seven reports) the potential for bias was neither clear nor calculable. Eventually, estimates of potential bias were made for 45 comparisons. Effects were small, the median bias being 0% (IQR 0% to 1.7%, range −4.9% to 14.6%), and in 87% of comparisons it was <5%.
This analysis indicates that hierarchical case definitions of increasing sophistication, involving confirmatory physical signs, more specific symptom patterns, or additional investigations, yield remarkably similar associations with putative occupational risk factors to those obtained using simpler case definitions.
Our analysis has certain limitations. In particular, the search strategy is unlikely to have discovered every occupational report over the two decades of inquiry that involved multiple case definitions. Specifying sensitive electronic search terms for this topic is challenging. To make the task manageable, we focussed on systematic reviews of occupational risk factors at each anatomical site as our main source of references to primary research reports, but to improve the detection rate we also hand-searched five leading peer-reviewed occupational journals. In the event, hand-searching identified only three potentially eligible papers missed by the database screen, lending some validity to our search. More importantly, papers were selected blind to information on the ratios of interest, and we have no reason to suppose that the reports retrieved were unrepresentative of the universe of relevant studies in the peer-reviewed literature.
A second area of difficulty lies in evaluating the role of chance in the ratio measures obtained. Scope to estimate CIs for the ratios was limited by the nested nature of many observations and the indeterminate extent of overlap in others. However, the infrequency of ratios of >2 across all 320 comparisons, and of lower confidence limits of RRs for stricter definitions exceeding the central estimates for simpler definitions, argue against important differences being overlooked by chance. Additionally, findings were insensitive to the exclusion of two large influential studies that contributed more than half the observations, suggesting that this possible non-independence of data points from the same study was not a cause of material bias. A further sensitivity analysis suggested that the potential for bias arising from differences in the reference group for paired risk estimates was inconsequential.
A lack of detectable difference between case definitions (ratio close to unity) could arise if the occupational exposures studied were not risk factors for symptoms or disorders at the sites in question, or were defined with substantial measurement error, such as to appear so. However, many of the exposures evaluated were well established or plausible risk factors, and RRs exceeded 1.0 for 81% of the 640 estimates of RR identified during the review. True differences might also be masked by confounding. However, estimates were adjusted in the same statistical models and any superiority in favour of the stricter definition could only be masked by systematic negative confounding relative to the simpler definition. Differences in approach to exposure assessment between studies are unlikely to have biased findings, given the focus on ratios of RRs derived within-study comparisons in which exposures were defined identically.
Alternatively, and more plausibly, little “added value” is created by case definitions of increasing sophistication relative to simple ones in population-level studies, and if so a good case exists for simplification. In planning field studies of ULDs and occupational risk factors, or employers’ surveys of hazard and risk, resource expended on additional physical examinations or investigations will generally yield marginal or no benefits; whereas costs will be predictably higher, with added problems of inconvenience and non-compliance. Similarly, in health surveillance, a complex case definition will be more difficult to implement uniformly across settings and over time, with no clear offsetting benefits.
Our findings do not identify a preferred set of case definitions for ULDs in the context of prevention and surveillance at the population level. Rather, they suggest that variations make little difference. Thus, the preferred starting point for aetiological investigation will normally be a broad case definition that is simpler to apply and may give more cases and therefore greater statistical power. This does not preclude additional exploratory analyses on subsets of cases defined according to more stringent diagnostic criteria, if researchers wish to seek evidence of differential associations with risk factors. It may also be more efficient to use stricter criteria when cases have already been defined in this way (for example, if cases of CTS are readily identifiable from the records of a neurophysiology department, all of whom meet certain criteria following nerve conduction testing). In most situations, however, simple case definitions will suffice. Similarly, in surveillance, simple choices which are easier to implement will usually be preferable.
Finally, in appraising the research literature, our findings imply that heterogeneity of case mix and variations in approach to case classification have less impact than might be supposed. This gives a justification in systematic reviews and meta-analyses for pooling data on associations with risk factors at a given anatomical site between studies, even though such studies may have differences of case definition.
More generally, the utilitarian framework offers an empirical basis for moving towards a simpler, more rational basis on which to classify ULDs for preventive purposes.
What is already known on this subject: There is widespread disagreement between experts on the optimal classification of upper limb disorders and the best case definitions.
What this study adds: Complex case definitions yield similar associations with occupational risk factors to simpler definitions. This provides encouragement to adopt simpler case definitions in population-based aetiological research and surveillance, and to pool data in meta-analyses despite differences in case definition.
This study was supported by a grant from the UK Health and Safety Executive, with the aim of improving consensus over case definitions for ULDs. A full technical report, of which this abridged summary forms a part, was kindly commented on by the following experts: Dwayne Van Eerd, Dorcas Beaton, Sigurd Mikkelsen, Johan Hviid Andersen, Alexis Descatha, Eira Viikari Juntura, Alex Burdorf, Jolanda Luime, Mats Hagberg, David Rempel, and Barbara Silverstein. Sue Curtis helped prepare this manuscript; Clive Osmond, Hazel Inskip and Georgia Ntani offered comments on its statistical aspects.