The main finding of our review of quantitative approaches for benefit and harm assessment used in the medical literature is a simple algorithm that categorizes existing quantitative approaches broadly into approaches that consider single or multiple benefit and harm outcomes and into approaches that use a benefit-harm comparison metric or present outcomes side by side. We also found that for most approaches, researchers use aggregate data so as to make the approaches suitable for systematic reviews even if that is not their intended purpose. Interestingly, only few approaches provide measures of uncertainty and none of the approaches considers a potential correlation between benefit and harm outcomes (joint distribution).
We identified a number of assumptions that researchers make when applying some of the quantitative approaches: First, for some approaches researchers assume that one or more benefit and harm outcomes can be put on the same scale to calculate a benefit and harm comparison metric. Challenges for putting different outcomes on the same scale include their relative importance to decisionmakers, simplification of the outcomes (e.g. dichotomizing continuous outcomes, which may lead to substantial loss of information), or different methods and timing in the ascertainment of different outcomes.
However, the advantages of a benefit-harm comparison metric may be substantial, for example, in the context of complex situations where multiple outcomes are important and where patient, provider, and policymaker preferences vary [7
]. It is a great cognitive challenge to process such a multidimensional task without a benefit-harm comparison metric. The major advantage of using a benefit-harm comparison metric (over using an approach without such a common metric) is that it can make explicit assumptions about the relative importance of outcomes or the arbitrary selection of the evidence on benefits and harms or on baseline risks, and that sensitivity analyses can provide evidence as to how the benefit-harm comparison changes if different assumptions are made. Also, a single number may provide some advantages for the communication of benefit-harm comparison to patients because it avoids overwhelming the patients with data on multiple different outcomes.
Second, we were surprised to see that there were no quantitative approaches that considered or even discussed the joint distribution of benefit and harm outcomes, even when individual patient data were available. The joint distribution describes the correlation between benefit and harm outcomes. Trial reports commonly describe standard errors and confidence intervals for the benefit and harm outcomes separately, but rarely describe the joint distribution of the effects of the treatment on the benefit and harm outcomes. Without the joint distribution of all the effects, we have to assume independence of the benefit and harm effects. This may not yield a valid estimate of the uncertainty of the benefit-harm balance metric. Changes in reporting practices, such as online journal appendix materials or online repositories of covariance data for later data synthesis, could address this limitation. Systematic reviewers should keep in mind the limitation of not considering the joint distribution when interpreting results from a quantitative benefit and harm assessment.
The figure and table show a number of characteristics that help distinguish various existing quantitative approaches. While these characteristics are important for the selection of an appropriate quantitative approach, there are a number of additional considerations that researchers need to make because they have implications regarding the type of evidence included in the benefit-harm assessment. For example, clinical trials are commonly designed to provide high-quality evidence and sufficient power for benefit outcomes. Harms often receive much less attention in terms of accurate and valid methods of measurement [40
]. Such asymmetry in the quality of outcome ascertainment affects the validity of a quantitative benefit and harm assessment, but it is yet unclear how to downgrade the quality of evidence for this reason.
In contrast to the framework developed here that focused entirely on the quantitative assessments, Lynd and others developed criteria that apply to the entire process of a benefit-harm assessment. This usually requires that researchers consider both quantitative and qualitative approaches to make conclusions regarding benefit-harm comparisons of health care interventions [10
]. Lynd and others proposed 10 criteria for benefit-harm assessments--be universal, inclusive, comprehensive, patient-sensitive, easily interpreted consider preferences, define when benefits outweigh harms, incorporate uncertainty, be flexible and integrate economic evaluations) [41
]. We agree with these guiding principles but also think that researchers cannot readily use them to judge the adequacy of specific quantitative approaches. Whether or not a specific approach is adequate depends much on the type and quality of available data. Regulatory decisionmakers, guideline developers, or users of the evidence are likely to perceive the ease of use and ease of interpretation of quantitative approaches very differently because of different levels of methodological expertise or different perspectives. Therefore, we believe that our framework for organizing quantitative approaches is complementary to, rather than competing with, what Lynd and others have proposed. The frameworks proposed by our team, Lynd, and others support a systematic, well-structured, and transparent process for reducing the multidimensionality of a benefit-harm assessment.
Our review showed that current quantitative approaches for benefit-harm assessment might need some further development. Firstly, many quantitative approaches identified here focus on binary outcomes that occur just once, with or without consideration of time to event. Current methods need extensions that also consider different types of data. Some patient-important outcomes, such as quality of life or symptoms, cannot be expressed appropriately as binary outcomes without substantial loss of information. Some benefit and harm events can occur several times so that the number of events per person-time needs to be considered rather than the proportion of persons with at least one event. Secondly, uncertainty estimates for the benefit and harm comparison metric (e.g. 95 percent confidence or credible intervals) are likely to be of key importance for decisionmakers and organizations making treatment recommendations. Researchers do not commonly report estimates of uncertainty that arises from sampling variability. In addition, none of the methods considers the joint distribution of benefit and harm outcomes. Researchers should develop statistical methods for considering joint distributions when estimating standard errors for benefit-harm comparison metrics. For systematic reviews it would be valuable to develop approaches for making assumptions about joint distributions because covariance matrices are rarely available from reports of primary studies and it may be challenging to request them from authors of primary studies. Thirdly, researchers should develop systematic approaches for sensitivity analyses that assess the influence of the various assumptions commonly made. One approach would be to agree on a list of standard sensitivity analyses for key aspects of a benefit-harm assessment. For example, data for estimating baseline risks (e.g. probability of outcome without treatment) can come from different sources (e.g. surveillance data, observational studies, and placebo arms of randomized trials). The best available evidence on treatment effects may sometimes come from single randomized trial or observational study rather than from meta-analyses. Researchers may be able to derive patient preferences by different eliciting techniques. A systematic outline of these options (the choices for the primary analysis and for sensitivity analyses), would make benefit-harm assessments transparent and give users of the evidence a sense for how sensitive the results are to different assumptions.
A strength of our review is the collaborative effort of clinicians, epidemiologists, and statisticians that helped us to develop a comprehensive framework for characterizing quantitative approaches for benefit-harm assessment. Some may perceive it as a limitation that we did not conduct a separate formal systematic review but we capitalized on an existing, recent review [5
]. Also, we used an iterative approach of developing a framework rather than following a more standardized approach, such as Delphi-like procedures, to identify important characteristics of quantitative approaches for benefit-harm assessment. However, a more standardized approach also has its limitations because it does not allow discussing intertwined issues or considering different perspectives of an interdisciplinary research group in great depth.
We developed a framework for the use of quantitative approaches for benefit-harm assessment that can help researchers select specific approaches. We do not make recommendations for or against specific approaches. It is too early to make such recommendations because of the lack of evidence from studies that directly compare quantitative approaches applied to a specific question. The adequacy of approaches depends on the specific benefit-harm question and on the amount and quality of data that determine how justifiable certain assumptions are. In some situations, there may be a single approach that appears to be most appropriate. But commonly, there will be several approaches that are reasonable options given the question, the goal of the benefit-harm assessment, and the available data. In such situations, we suggest that investigators use several approaches, as commonly used in other areas [43
], which acknowledges that none of them is perfect and based on some assumptions. The confidence in the results of benefit-harm assessments then depends on the extent to which different approaches arrive at similar results, and how useful they are to end-users. Evidence from studies applying multiple approaches to the same benefit-harm question, together with recognition of their advantages and disadvantages, would make it possible to identify approaches that are consistently superior over others, and to develop recommendations for specific approaches.