Although systematic reviews with meta-analyses are considered more objective than other types of reviews, our results suggest that the interpretation of the data remains a highly subjective process even among reviewers with extensive experience conducting meta-analyses. The implications are important. The evidence-based movement has proposed that a systematic review with a meta-analysis of RCTs on a topic provides the strongest evidence of support and that widespread adoption of its results should lead to improved patient care. However, our results suggest that the interpretation of a meta-analysis (and therefore recommendations) are subjective and therefore depend on who conducts or interprets the meta-analysis.
Previous authors examining discrepancies among meta-analyses focused on the subjective decisions regarding procedural issues leading to different data rather than on the interpretation of the data [4
]. We presented reviewers with the same meta-analyses and therefore the differences were due to the actual interpretation of the data. The GRADE group also found a lack of consensus among reviewers presented the same data. However, they concluded that this was because some reviewers thought there was sparse data and some did not [22
]. Our reviewers disagreed even when there were 10 RCTs with a total of 3685 patients and homogeneity between studies. Further, we minimized the effect of content knowledge by providing a summary of the clinical review articles to all reviewers, and instructing all reviewers to base their decisions only on the information provided in the packages; there was no indication in their written comments suggesting that these instructions were not followed. Even if the reviewers did not follow these instructions, the results would still imply that conclusions are highly dependent on the professional training of the authors (e.g. differences in understanding of physiology, epidemiology). For example, the two cardiologists in our group were generally more sceptical of the effects of treatment. Our study design did not allow for a detailed analysis of the underlying reasons for these results and we plan to explore these more fully in a mixed-methods design in the future. Finally, even though we provided everyone with the quality score of reporting for each study based on the Jadad scale [16
], it is possible that the reviewers differentially judged the quality of the studies.
Evidence-based medicine requires that the clinician make decisions based on the numerical results observed. Decision-making and clinical reasoning are complex processes, and different clinicians (and different patients) often choose different treatments even when provided with the same options and the same information. Our results may simply reflect the same process in the context of meta-analysis. There is a large body of literature examining these processes in other areas, and similar processes may be occurring in the meta-analysis context. Some selected examples of different frameworks are illustrated below.
In Gestalt intuition, a subject's decisions are influenced by the identification of hidden relationships within the whole context [23
]. With reflective practice, experts adapt their governing analytical premises to the complex problem at hand [11
]. In decision theory, the expert attempts to calculate the probabilities of various outcomes within a specific calculative framework [25
]. With tacit knowledge, the expert uses implicit decisional processes of which he/she is not consciously aware [27
]. The Bayesian approach recognizes that decision-making is a two-step process. The decision-maker first decides what he/she considers is an appropriate estimate for the effect of the treatment, which is dependent on prior beliefs and the likelihood function (fixed or random effects model). In the second step, the decision maker must recognize that there is a risk associated with whatever decision is taken. Therefore, the decision maker weighs 1) the risks of doing harm if they choose to give a treatment they believe is beneficial and the treatment is actually harmful, against 2) the risks of not providing benefit if they choose not give a treatment that they believe is ineffective/harmful and the treatment is actually effective (this is called the loss function in Bayesian analysis).
Our results suggest that a systematic review with a meta-analysis must be viewed with the perspective that it represents one study conducted by specific investigators with a specific methodology. At each step of the methodology (defining the general criteria, search strategy, inclusion/exclusion criteria, data abstraction, and analysis), subjective decisions are required that could affect the validity of the study; the relative importance of each will likely depend on the topic of inquiry and the data acquired. Our study demonstrates that disagreements in the conclusions of systematic reviews with meta-analyses can also be due to subjective interpretations of the results and not only of the methodology. Understood in this context, meta-analyses represent one more source of additional information that allows the scientific community to better understand a clinical question. It must therefore be read with as much caution as any other scientific paper.
Our study has potential limitations. Each of our reviewers had extensive experience conducting systematic reviews with meta-analyses. We believe that reviewers with less experience may interpret data very differently and we will study such reviewers in the future. Other investigators might have abstracted the data differently than us, or used different types of analyses (e.g. risk differences instead of odds ratios). However, this would not affect our results as all reviewers still viewed the exact same data and the actual "true" effect of magnesium on the outcome is not important for the purpose of this study. That said, a second trained individual validated all abstracted data and differences were resolved by consensus. Different reviewers may prefer different models and rely on different types of analyses and plots. Therefore, we provided each reviewer with results based on both fixed and random effects models, publication bias statistics/plots, forest plots, Galbraith plots, L'Abbe plots, and the original article. If a reviewer requested a specific plot or subgroup analysis, this was provided. Our data represents heterogeneous interpretations from one topic only and we cannot generalize to other topics or estimate the frequency with which this might occur. However, we believe this topic was particularly suited to our objective because it allowed us to compare decisions between and within reviewers for meta-analyses with little heterogeneity, and large amounts of heterogeneity. As we previously stated, reviewers were asked to base their decisions only on the information provided in the packages and we cannot be sure if this occurred. Although we asked reviewers to make comments on each package, a formal qualitative analysis was not possible with the data obtained.
Finally, although our data suggest that different reviewers interpret data differently, this study cannot provide insight into the reasons why. The answers to this very important question likely include subtleties and nuances that are difficult to capture using quantitative methods, and we will examine these subjective elements in a future study using a mixed-methods approach.