|Home | About | Journals | Submit | Contact Us | Français|
Meta-analyses are important evaluations in orthopaedic surgery, not only to create clinical guidelines, but also because their findings are included in public health and health policy decision making. However, with increasing numbers of meta-analyses, discordant and frankly conflicting conclusions have been reported. We searched for conflicting meta-analyses, ie, those arriving at different conclusions despite following the same research question, identified potential reasons for these differences, and assessed the statistical significance and clinical importance of differences. We identified conflicting meta-analyses on graft choice in ACL reconstruction and the use of hyaluronic acid. We found significant differences in individual results only for meta-analyses on hyaluronic acid, but the 95% confidence intervals of the magnitude of differences included values as much as 40% for ACL meta-analyses. However, our findings suggest most conflicts derive from differences in the interpretation of pooled results rather than in the actual results. Thus conclusions and interpretations from meta-analyses should be scrutinized as critically as those from any other type of study and subjected to reassessment if deemed necessary.
Evidence-based medicine, commonly defined as the conscientious, explicit, and judicious use of current best evidence, is gaining popularity in orthopaedic research. Usually evidence derives from clinical studies, whose results are weighted based on the level of evidence, determined by methodologic specifics. The highest level of evidence is attributed to randomized controlled trials performed to appropriate methodologic standards and reported according to the CONSORT statement, and meta-analyses of such trials performed to appropriate methodologic standards and reported according to the QUOROM statement [20, 21]. The findings produced in such studies are incorporated into clinical guidelines, serve for public health decisions, and can lead to changes in health economy and health policy [4, 9]. Good examples of such processes are the decision concerning reimbursement of hyaluronic acid, especially in Europe, and recommendations concerning prostate and breast cancer screening. This reflects the importance of such studies.
The popularity of meta-analyses has grown, especially during the last decade with 302 entries under the MeSH-term “meta-analysis” in PubMed in 1990, 1301 in 2000, 2885 in 2005, and finally 3095 publications in 2008, although these techniques have been available for some time . Recent developments in personal computers and statistical software have made these methods available for practically everyone who has a computer and access to the Internet, and the emergence of evidence-based medicine and evidence-based medicine platforms created further interest in this area of research. However, with an increasing number of meta-analyses, disagreement and frankly conflicting conclusions have occurred among meta-analyses studying the same questions and pooling data from similar backgrounds, thus raising questions regarding the reliability of this current best evidence . Given the importance of meta-analyses and their findings in clinical medicine, public health, health economics, and health policy, such conflict might have important consequences, considering for example, again, reimbursement for hyaluronic acid treatment, which has been discouraged because of negative results in some meta-analyses [1, 18] despite the fact that others [3, 35] concluded there were beneficial effects, as did randomized controlled trials for certain preparations [2, 29, 30].
We therefore posed four objectives: (1) to identify conflicting meta-analyses in orthopaedic research; (2) to ascertain whether differences in study quality or study characteristics explain such conflicts; (3) to assess the statistical significance and clinical importance, ie, the size, of the actual differences in results in meta-analyses with conflicting conclusions; and (4) to use our findings to identify among conflicting meta-analyses those that provide the most relevant answers to their question.
To identify conflicting meta-analyses, we performed an online systematic literature review of orthopaedic meta-analyses using variations of the terms “orthopedic,” “orthopedics,” and “orthopedic procedures” and “meta-analysis” in PubMed and the Cochrane Library (Fig. 1). All searches were concluded by February 1, 2008. We defined meta-analyses eligible for inclusion in our study as those performing a systematic review and formal data synthesis of relevant data on a defined study question. Studies of orthopaedic treatments were eligible, whereas studies on medical, radiologic, or anesthesiologic procedures in orthopaedic patients were excluded. In addition to the database search, we interviewed four orthopaedic surgeons involved in evidence-based medicine, epidemiology, or biostatistics to identify further conflicting meta-analyses. We identified 218 eligible published meta-analyses and 61 Cochrane reviews. All meta-analyses were categorized according to topic and study questions. Conflicting meta-analyses were defined as those investigating the same study question but arriving at a different conclusion or treatment recommendation (Fig. 1).
To identify conflict arising from differences in study quality, we (PV, RD) used the Oxman-Guyatt [20, 21] and the QUOROM  indices to assess the overall methodologic quality of meta-analyses. The Oxman-Guyatt index is an instrument to measure the methodologic quality of review articles ranging from 1 (extensively flawed) to 7 (minimally flawed) points [20, 21], whereas the QUOROM index derives from the QUOROM statement and is a measure of the quality of reporting of meta-analyses ranging from 0 (worst) to 18 (best) points . Additionally to those quantitative methods, we defined five questions to qualitatively assess potential sources of conflict arising from differences in materials and methods in meta-analyses. First, we asked whether the rationale of repeating the meta-analysis was reported including reference to prior meta-analyses. Although this is not a formal parameter of study quality, a thorough systematic review of the literature and inclusion of all findings is the bedrock of a meticulous meta-analysis. Thus detection and discussion of earlier meta-analyses can be considered a surrogate for diligent handling. Second, we asked whether the same studies and/or studies of the same level of evidence were included. Third, we asked whether the same end points were studied. Fourth, we asked whether the same methods were used in the analytical approach. Here we looked specifically at how publication bias was ruled out, how study quality was assessed, how heterogeneity was tested and accounted for, and how data were pooled. Our final question was whether there were other apparent problems.
After assessing potential sources of conflict in conclusions, we asked whether these conclusions were supported by the results of meta-analyses. To assess the significance and the magnitude of differences in presented end points between conflicting meta-analyses, Cochran’s chi square test for heterogeneity, or Q test for short, and I2 index were used. The Q test is a validated instrument to test the significance of differences between study results in meta-analyses. An important shortcoming of this test is that its power is low in small samples [13, 14]. Thus we set the threshold of significance for the Q test at 0.1 instead of 0.05, to adjust for the small size of our sample and to avoid missing differences because of false negative results. The Q test assumes statistical independence, however, because the studied meta-analyses have at least some overlap in their primary studies, this assumption might be violated. Such a violation leads to too small p values. In addition to the significance of differences, the I2 index gives an estimate of the actual magnitude of between meta-analysis variability relative to the total amount of variability, which also includes the play of chance . The I2 index also depends on sample size; thus, we calculated the 95% confidence intervals for the I2 index based on the method by Higgins et al. [14, 15] to account for this potential problem. This method is robust to low sample sizes in contrast to the Q test and the I2 index. Q tests, I2 indices, and confidence intervals were calculated using Intercooled Stata® 10 (StataCorp LP, College Station, TX).
To identify among meta-analyses, those that provide the most relevant answers to their question, we used the following algorithm. First, meta-analyses were ranked by study quality as given by the Oxman-Guyatt and the QUOROM indices. If there were conflicts among meta-analyses of high quality, we used the five questions of our qualitative assessment as described above to identify and address potential sources of such conflict. If conflicting conclusions remained, we used the significance and magnitude of the differences in end points to test whether conflicts are supported.
We found conflicting meta-analyses in two topics: choice of graft in ACL reconstruction (n = 7) (Table 1) [5, 8, 10, 12, 28, 33, 36], and hyaluronic acid for treating osteoarthritis of the knee (n = 5) (Table 2) [1, 7, 18, 19, 35]. For these two topics, we could extract data on pain for hyaluronic acid, and the Lachman test, pivot shift, and extension lag for graft choice in ACL reconstruction.
It is unlikely study quality was a reason for differences among these studies. The assessment of the quality of the meta-analyses as a potential source or explanation for differing results showed considerable differences in methodologic quality and quality of reporting only for meta-analyses of graft choice for ACL reconstruction (Table 3), but fairly high and evenly distributed values for meta-analyses of hyaluronic acid. Our first question in the qualitative assessment of sources of conflicts among meta-analyses of ACL graft choice showed only two studies reported the rationale of repeating the meta-analysis. Furthermore, the levels of evidence among the included primary studies in these meta-analyses were rather low and vary among studies, especially with time. Also there was variation in the numbers of studied end points, but all studies included stability (Table 1). Methodologically, most meta-analyses were rather low quality. Assessment of publication bias was neglected completely, and only Biau et al.  tested study quality. Also, heterogeneity was rarely adequately considered (Table 4). Finally, the validity of the methods used to pool data occasionally was unclear. Prodromos et al.  and Thompson et al. , for example, merely added numbers from the primary studies to obtain overall results. This, however, can lead to biased results owing to Simpson’s paradox (ie, the outcomes appear reversed when the groups are combined), particularly if the treatment allocation is not equal in all studies. Also, there seemed to be important differences in the surgical procedures, such as the type of fixation used. The qualitative assessment of meta-analyses on hyaluronic acid in osteoarthritis showed two did and two did not report a rationale for repeating the analysis (Table 5). Although Wang et al. , Arrich et al. , and Lo et al.  included essentially the same primary studies, Modawal et al.  included fewer studies, considering only a selection of hyaluronic acid preparations. All authors but Espallargues and Pons  included only randomized controlled trials and performed sensitivity analyses to include heterogeneity in their analyses. The same end points were studied but analyzed in different ways. Wang et al.  standardized his pooled estimates on maximum scale intensity and trial duration, thus making direct comparison with the pooled estimates in the other studies difficult. All studies considered publication bias, study quality, and heterogeneity in their analyses (Table 4).
Statistical inference suggested no significant differences in the numerical results of conflicting meta-analyses other than for hyaluronic acid (Table 6). Quantitative assessment of the magnitude of difference among the results of conflicting meta-analyses by the I2 index corroborated these findings. However, the 95% confidence intervals, adjusted for small sample size, suggested there might be as much as 41% of true difference beyond random variation in results for meta-analyses of ACL grafting (Table 4).
Our last objective was to use our findings to identify among conflicting meta-analyses those that provided the most relevant answers to their question. For ACL graft choice the studies by Biau et al. , Forster and Forster , and Goldblatt et al.  had the highest methodologic quality, but still presented conflicting conclusions (Tables 1 and and3).3). As mentioned above, the qualitative assessment of studies showed differences only in the surgical procedures, suggesting confounding by this variable . Another study, which is not a primary meta-analysis and thus was not included in this study, found that after adjustment for type of fixation, hamstring tendon produces better postoperative outcome in pain, and at least equal outcomes in stability . The existing data suggested the most relevant answer is that a hamstring tendon graft produces a better outcome, to a small extent, than a bone-tendon-bone graft. As for hyaluronic acid, study quality was consistently high in all meta-analyses by Oxman-Guyatt and QUOROM indices and by our qualitative assessment. When looking for apparent problems (the last question in our qualitative assessment), we saw that these studies combined distinctly different preparations of hyaluronic acid, which is consistent with clinical heterogeneity and begs the question whether pooling was justified. The statistically significant difference between the presented end points in the meta-analyses suggests that there is a true difference. The most relevant answer is that individual preparations of hyaluronic acid should be considered one at a time or in subgroups.
Meta-analyses have gained meaning in orthopaedic surgery as an instrument for evidence-based clinical decision making, and for public health decisions, by providing high-quality evidence pooled from the current literature. Errors and uncertainties or even conflicting conclusions in meta-analyses, in turn, will cause confusion and might even precipitate severely problematic consequences. This potential problem led to this study, which focused on conflicting Level 1 evidence from orthopaedic meta-analyses. Specifically, we had four objectives. First, we wanted to identify any conflicting meta-analyses in orthopaedic surgery. Our second objective addressed potential sources for conflict. Third, we studied differences in outcomes in conflicting studies to see whether these differences would support conflicting conclusions. Finally, we tried to present the best evidence for each topic, building on our findings.
Our study has some limitations. Most importantly, it builds on what is reported in the literature, which is not necessarily the same as what was done in the included studies. This study is also especially susceptible to publication bias because it is likely many meta-analyses presenting results in conflict with earlier studies were never published. However, we present all studies we were able to identify, and our study can be seen as a comprehensive sample rather than a complete account of all conflicting meta-analyses without jeopardizing the validity of our findings. Another potential shortcoming derives from our definition of “conflicts.” We only included meta-analyses based on conflict in their conclusions rather than numerical result because we argued the conclusion, as the final product of any study, has more meaning. Also, we were not able to investigate all potential sources of conflict in the scope of this article but focused on the most important ones. We tried to present the most relevant answers, supported by strong evidence, for each topic. Finally, concerning our statistical analysis, there is a lack of independence between pooled estimates because there is overlap in the primary studies from which these pooled estimates were built. Such a lack of independence is a violation of the assumptions of the used Q-test because the data are more similar than the test expects, which in turn could cause spuriously small p values and too narrow confidence intervals.
We were able to identify conflicting meta-analyses, defined as studies arriving at different conclusions despite studying the same literature to answer the same question, in two orthopaedic topics: graft choice in ACL reconstruction and management of osteoarthritis with hyaluronic acid. Poolman et al.  also investigated systematic reviews on the latter topic. However, we chose to focus on meta-analyses, which add analytical methods to systematic reviews to arrive at a pooled estimate of treatment effectiveness, instead of merely describing previous results in a systemic fashion. Discordant and openly conflicting meta-analyses have been reported in other fields as well . For example, two meta-analyses of heparin in the prevention of perioperative thrombosis published in the British Medical Journal and Lancet during the same year led to opposite results [17, 25]. Other reports of conflicting meta-analyses have been mentioned in conjunction with commercial interests; for example, a meta-analysis refuting the effectiveness of selective serotonin reuptake inhibitors in the management of depression after its publication was promptly challenged by a meta-analysis presented by Lilly Industries, coming to a clearly different result and showing high effectiveness for this type of medication [24, 31].
Subsequently, we assessed potential sources of conflict. There were little differences in quality indices for the meta-analyses on hyaluronic acid, thus precluding this as an explanation for differences. For studies on graft choice, however, considerable differences could be seen, allowing elimination of studies with low scores to come closer to a conclusion. However, both indices we used build on information reported in the papers and thus, as mentioned above, might be biased by differences in the accuracy of reporting as orthopaedic journals might emphasize statistical/methodologic details differently from others or even reduce the level of detail in reporting owing to space limitations. Thus, we added a qualitative part to this assessment, using four questions, chosen from commonly known problems and limitations in primary and repeated meta-analyses. The first and maybe most important question concerned the rationale for repeating the analysis. Although this subject has hardly been studied, repeating or updating rarely (9%) leads to changes in the pooled results of meta-analyses [11, 22, 23]. However, conclusions from meta-analyses have been refuted by subsequently published analyses and even by large randomized controlled trials . Therefore, repeating of meta-analyses might provide new and important changes and should not be discouraged a priori. Of further interest was how authors defended their conclusions in relation to prior studies, especially when taking into account that a conflicting, and thus new, conclusion might increase the chances for successful submission for publication. Also, the availability of new evidence might be important in altering conclusions when repeating meta-analyses, however, all the presented meta-analyses were published in a maximum time frame of 5 years. Thus, a more likely source of new evidence is non-English studies, which have been excluded owing to language bias, or publication bias rather than newly published evidence. Our other questions focused mostly on methodologic issues as investigators may choose from a range of methods of how to identify, weigh, and pool data in meta-analyses, which, despite being equally valid, might produce quite different values for pooled results. Studying the methodologies of the included meta-analyses, we saw large between-topic differences but fairly consistent quality in a topic. It is noteworthy that none of the meta-analyses on ACL grafts reported on assessment of publication bias. Such testing, usually accomplished graphically using funnel plots or mathematically using Egger’s regression or Begg’s ranked correlation, shows the homogeneity of the literature and thus can be seen as the initial step toward assessing the appropriateness of pooling. Also, only one of these meta-analyses assessed study quality. It is important to consider internal and external validity when choosing primary studies for inclusion in a meta-analysis. Internal validity, which is the validity of the used methods to answer the study question, is best analyzed by assessment of the study design. External validity, which is the generalizability of results or the applicability in a specific population of interest, is best assured by stringent inclusion and exclusion criteria. Stringent inclusion and exclusion criteria in primary studies cannot replace assessment of internal validity. However, the inclusion in meta-analyses of outcomes of assessments of study quality is difficult. Very much like publication bias and study quality, the assessment of heterogeneity was gravely neglected in many meta-analyses of ACL graft choice. Accounting for heterogeneity is of special importance in surgical studies because, compared with pharmaceutical treatments, surgical treatment cannot be completely standardized, and outcomes will vary even if the same surgeon performs the same procedure. Meta-analyses on hyaluronic acid addressed publication bias, study quality, and heterogeneity adequately and according to current standards. Another problem that stands out in all topics, however, is appropriateness of pooling. Especially for hyaluronic acid, the studies in conflict with each other pooled data on different combinations of preparations, suggesting the effectiveness of some is diluted in the lack of effect of others. This is confirmed by findings presented in the systematic review by Bellamy et al. .
Finally, after studying potential sources of conflict, we wanted to study the actual differences in numerical outcomes to test whether they would support conflicting conclusions. We used the two parameters quality, or statistical significance, and quantity, or the actual magnitude of difference, independently because a statistically significant difference between outcomes is not necessarily clinically important and a nonsignificant difference might have a clinically meaningful size nevertheless. Especially in meta-analyses, with often quite large sample sizes, standard errors might be very small, thus creating very small p values leading to further overemphasis of statistical significance at the expense of clinical importance. Our assessment showed virtually all the numerical results in conflicting meta-analyses were not significantly different from each other. Also, when the magnitudes of differences were assessed using the I2 index, which is a measure of heterogeneity among results beyond difference attributable to normal, random variation, we found mostly no differences. However, we also calculated 95% confidence intervals for I2 adjusted for small sample size and could show these intervals allowed for some considerable differences. Our findings suggest much of the actual conflict in conclusions originated from the interpretation of results rather than from their actual values.
The question of which conclusion to trust among conflicting meta-analyses remains to be answered. Jadad et al.  suggested an algorithm in 1997 to deal with discordant systematic reviews, building on parameters of study design such as study question, inclusion criteria, and methods of pooling. We modified this algorithm using the findings from our study and used it to find the best evidence for both studied topics. For hyaluronic acid, all studies are of equal quality and build on equal materials and methods. For this topic, as mentioned above, it seems pooling was not always appropriate. Lo et al.  and Arrich et al.  found differences in effectiveness attributable to molecular weight in their analysis, although Arrich et al.  did not find a significant association. Furthermore, there is strong evidence from randomized controlled trials and meta-analyses for the effectiveness of some hyaluronic acid preparations in the treatment of osteoarthritis [2, 7, 29, 30]. We did not find a best answer, but rather a best question. A similar situation can be seen in ACL grafts. Poolman et al.  used the data published by Biau et al.  in a cumulative meta-analysis with sensitivity analyses for the type of fixation. Poolman et al. presented clear evidence supporting the use of hamstring grafts with modern fixation technique (EndoButton®) resulting in equal results in stability at lower complication rates than bone-patellar tendon-bone grafts. Also, by using cumulative meta-analysis, this group could show these findings could have been established as early as in 2001. Future studies on hyaluronic acid would benefit from a similar approach.
We have identified meta-analyses with conflicting conclusions in orthopaedic surgery. However, we believe these conflicts derive from interpretation of results rather than from the pooled results, and therefore they might be resolved by assessing whether conclusions are supported by the results and the way those results were generated, which also includes assessment of the appropriateness of pooling. Additionally, although several studies suggest repeating or updating meta-analyses rarely produces changes in results, we found this is not necessarily true. We recommend meta-analyses be treated like any other study and their results be critically scrutinized and reassessed if justified. Repeating meta-analyses in the face of new evidence might produce important new findings and add to our knowledge.
Each author certifies that he has no commercial associations (eg, consultancies, stock ownership, equity interest, patent/licensing arrangements, etc) that might pose a conflict of interest in connection with the submitted article.