|Home | About | Journals | Submit | Contact Us | Français|
Several quantitative approaches for benefit-harm assessment of health care interventions exist but it is unclear how the approaches differ. Our aim was to review existing quantitative approaches for benefit-harm assessment and to develop an organizing framework that clarifies differences and aids selection of quantitative approaches for a particular benefit-harm assessment.
We performed a review of the literature to identify quantitative approaches for benefit-harm assessment. Our team, consisting of clinicians, epidemiologists, and statisticians, discussed the approaches and identified their key characteristics. We developed a framework that helps investigators select quantitative approaches for benefit-harm assessment that are appropriate for a particular decisionmaking context.
Our framework for selecting quantitative approaches requires a concise definition of the treatment comparison and population of interest, identification of key benefit and harm outcomes, and determination of the need for a measure that puts all outcomes on a single scale (which we call a benefit and harm comparison metric). We identified 16 quantitative approaches for benefit-harm assessment. These approaches can be categorized into those that consider single or multiple key benefit and harm outcomes, and those that use a benefit-harm comparison metric or not. Most approaches use aggregate data and can be used in the context of single studies or systematic reviews. Although the majority of approaches provides a benefit and harm comparison metric, only four approaches provide measures of uncertainty around the benefit and harm comparison metric (such as a 95 percent confidence interval). None of the approaches considers the actual joint distribution of benefit and harm outcomes, but one approach considers competing risks when calculating profile-specific event rates. Nine approaches explicitly allow incorporating patient preferences.
The choice of quantitative approaches depends on the specific question and goal of the benefit-harm assessment as well as on the nature and availability of data. In some situations, investigators may identify only one appropriate approach. In situations where the question and available data justify more than one approach, investigators may want to use multiple approaches and compare the consistency of results. When more evidence on relative advantages of approaches accumulates from such comparisons, it will be possible to make more specific recommendations on the choice of approaches.
Some decisions on health care interventions are straightforward because the benefits clearly outweigh the harms or vice versa. Many decisions, however, require careful balancing of the benefits and harms. For example, in order to decide on the use of aspirin for the prevention of myocardial infarction, one would typically consider the risk reduction for myocardial infarction over a certain period of time (e.g. 10 years) as well as the increased risks for hemorrhagic stroke and gastrointestinal bleeding . One would also need to consider that the benefit-harm comparison varies across patients since the risks for these outcomes and absolute treatment effects depend much on an individual patient’s profile including her or his preferences. Sometimes, the decisionmaking context is even more complex than for aspirin. Tamoxifen for the prevention of breast cancer, for example, modifies the risk not only for breast cancer, but also for endometrial carcinoma, bone fractures, pulmonary embolism, stroke and cataracts . As with aspirin, the benefit-harm comparison of tamoxifen depends on a woman’s profile, which in this case includes age, the risk for invasive breast cancer, race and whether the uterus is intact or has been removed .
In situations where multiple outcomes, patient profiles and also patient preferences need to be considered, it is challenging to compare benefits and harms without a quantitative approach. Without a quantitative approach, it is not verifiable if and how different outcomes, patient profiles, patient preferences, and sources of evidence were considered and weighted. Non-quantitative assessments of benefits and harms may lead to inappropriate decisions for or against treatments. This lack of transparency may be less problematic in an individual decisionmaking context, but it seems unacceptable when major regulatory decisions or clinical guideline recommendations are at stake. A number of quantitative approaches have been developed and applied to handle the multidimensionality of a benefit-harm assessment [3-8]. But there is little guidance on the selection of an appropriate quantitative approach for a particular clinical question in a benefit and harm assessment [5,7,9]. Past reviews of quantitative approaches for benefit and harm assessment have not organized methods according to important characteristics [5,7,9]. A framework that recognizes their important characteristics and organizes them accordingly could be a step forward to understanding the common and different elements of existing approaches, to guide their further development, and support investigators, who plan to conduct a quantitative benefit and harm assessment, in their choice of approach.
Such an organizing framework could be particularly attractive for organizations or investigators who conduct or use systematic reviews because the literature is mostly silent about the use of quantitative benefit and harm assessment approaches in the context of a systematic review. Familiarity with available approaches and their key characteristics will help systematic reviewers to develop protocols that specify all sources and types of data needed for benefit-harm assessments. For example, additional database searches may be needed to identify evidence on baseline risks, harms or patient preferences that would be missed by standard searches that commonly focus on randomized trials. Therefore, our aim was to review existing quantitative approaches for benefit-harm assessment and to develop an organizing framework that helps understanding of differences among methods and their selection for a specific clinical question.
Our purpose was to review quantitative approaches for benefit and harm assessments that use formulas or graphical displays to compare the benefits and harms. We evaluated methods feasible for use in systematic reviews or to use the data synthesized in systematic reviews. We did not consider theoretical frameworks and qualitative approaches for benefit and risk assessment, nor approaches that have not been used in the medical field. It is important to note that our review focused on quantitative assessment and not on the entire process of a benefit-harm assessment (e.g. in the context of a new drug approval) that includes quantitative and qualitative processes [10,11]. Also, we did not review approaches for making treatment recommendations for populations or individual patients because this requires consideration of specific health care contexts, costs of treatment, and other contextual factors, which is beyond the scope of most systematic reviews.
We began our search for quantitative benefit and harm approaches with key articles culled from the investigators’ reference libraries, including prior work on approaches for assessing benefits and harms. We looked for articles that were written for quantitative benefit and harm assessment, which included consideration of at least one outcome for both benefit and harm of a medical or public intervention. We included approaches that analyzed benefit and harm outcomes entirely separately as well as approaches that provided a benefit and harm comparison metric where single or multiple benefit and harm outcomes are put on the same scale, which we will call the benefit-harm comparison metric (e.g. Quality-adjusted Life Years [QALYs] or probability scale). We screened the reference lists of all included articles for more relevant articles. The group then discussed each article that seemed potentially relevant, as described below. We also reviewed the manuals of the Evidence Based Practice Center program of the Agency for Healthcare Research and Quality (AHRQ) (http://www.ahrq.gov/clinic/epc/) and of the Cochrane Collaboration.
We did not perform a formal systematic review of the literature because a review on the topic of quantitative benefit and harm assessment already existed  and because our focus was on organizing available approaches. We capitalized on the work done to create a list of relevant approaches , which allowed us to devote adequate resources to the main focus of developing an organizing framework that helps understanding the differences among methods and their selection for a specific clinical question.
Our team, consisting of clinicians, epidemiologists and statisticians, discussed the identified quantitative approaches for benefit and harm assessment and the context in which they might be used in 12 one-hour sessions. The discussion served to define properties that characterize quantitative approaches for benefit and harm assessments. Based on these characteristics we developed a simple algorithm that may guide investigators, systematic reviewers, guideline developers or policymakers in their selection of a quantitative approach for benefit-harm assessment. We iteratively defined key characteristics with which existing quantitative approaches for benefit and harm assessment can be described and that allow comparisons across quantitative approaches. We also recorded limitations inherent to each of them that may threaten their usefulness or limit their applicability for certain questions of benefit and harm. In addition, we prepared examples for quantitative approaches for benefit-harm assessment that would highlight differences between them.
We identified three characteristics of a decisionmaking context that need to be defined. First, the treatment comparison and population for which the benefit-harm assessment is made should be characterized. The comparison of interest can be an intervention versus no intervention or an intervention A versus an intervention B. A population can be broadly defined, for example as a screening or primary prevention population, or be restricted to a particular setting (e.g. primary care) or a particular clinical population (e.g. patients with manifest coronary heart disease). Secondly, the key benefit and harm outcomes of interest for which the evidence and the benefit-harm comparison is sought should be identified. The figure shows that there might be a single benefit and a single harm outcome that are of interest, or there could be multiple outcomes. As is noted in literature on the more general processes of assessing the benefits and harms of interventions in guidelines and systematic reviews, patient-important outcomes might be preferred as key outcomes but, sometimes, a surrogate outcome may be considered if there is a strong correlation with patient-important outcomes or if no evidence on patient-important outcomes is available. Finally, one should decide whether a benefit-harm comparison metric is desired that puts all outcomes on a common scale so that a single number will inform about the comparison of benefits and harms. The decisionmaking context is likely to be important to decide on the need for a benefit-harm comparison metric because the output of the available quantitative approaches differ substantially and may not be appropriate for all decisionmakers such as patients and their health care providers, clinical or public health guideline developers, regulatory agencies, policymakers, or payers.
We identified 16 approaches, which can be grouped into two broad categories (Figure (Figure1):1): One category comprises simpler approaches that typically deal with a single outcome for benefit (e.g. prevention of myocardial infarction) and one outcome for harm (e.g. gastrointestinal bleeding). The single benefit and harm can, however, also be composite outcomes (e.g. cardiovascular events) that summarize several outcomes (acute coronary syndrome, stroke, and cardiovascular death). The more complex approaches consider multiple outcomes for either benefit or harm, or both. Of note, although some approaches like the Number needed to treat (NNT) [12,13] and Number needed to harm (NNH) and (Quality-adjusted) Time without Symptoms and Toxicity (Q- TWiST) [14,15] are mostly used when there is a single benefit and a single harm outcome. Researchers can use these approaches separately for different outcomes in situations where multiple outcomes are important (e.g. multiple NNTs and NNHs). In the figure, we categorized approaches according to how they are typically used in the medical literature (e.g. NNT and NNH are typically used for single outcomes), but listed a few of them in two categories if we could not clearly categorize them (e.g. Minimum Target Event Risk for Treatment [MERT], (Quality-adjusted) Time without Symptoms and Toxicity).
The figure shows that the use of a benefit and harm comparison metric further distinguishes various approaches. The number of outcomes, the need for a benefit and harm comparison metric, and the quality and quantitative of available data will likely drive the selection of approaches.
To facilitate the understanding of and between the approaches, we selected three approaches for a more detailed description (Table (Table1).1). NNT and NNH are examples of an approach that can be used if a single benefit and a single harm outcome is of interest or when multiple outcomes are treated separately. Multicriteria decision analysis (MCDA) [16,17] and the Gail/National Cancer Institute  approach provide examples for approaches where multiple outcomes are considered and put on a benefit-harm comparison metric.
We focus here on additional key characteristics that enable researchers to understand the similarities and differences and choose the appropriate approaches for their benefit-harm question and context. Papers focusing on each individual approach have described in detail the 16 approaches although not in the context of systematic reviews .
We identified the following additional key characteristics:
1. The type of data needed: Individual patient data have advantages over aggregate data typically available for evidence synthesis. In an individual study, information on the co-occurrence of benefit and harm outcomes is available for each patient. An important consequence of the availability of such individual data is that researchers can consider the joint probability of benefit and harm outcomes.
2. The type of analyses: Analyses for benefit-harm assessment could include any type of statistical analysis. There is a major distinction, however, between approaches that are data driven (deterministic) and approaches that use modeling (stochastic) where data are not described by unique values, but rather by probability distributions.
3. The type of benefit-harm comparison metric: Researchers may use absolute and relative metrics as well as QALYs.
4. Assumptions: In benefit-harm analyses researchers need to make a number of assumptions. For example, if researchers use a benefit-harm comparison metric, the assumption is that it is justifiable that outcomes are combined on a single scale. Other assumptions relate to the joint occurrence of separate outcomes. Some approaches assume that separate outcomes occur independently, which may be justifiable in some instances.
5. Consideration and incorporation of patient preferences: Some quantitative approaches explicitly consider patient preferences for different outcomes in order to weight the benefits and harms.
6. Types of presenting benefit risk comparisons: Researchers can use various formats to present the results of a quantitative benefit-harm assessment. Researchers may present the benefit-harm comparison as a difference in the number of events between a treatment or no treatment, or as a ratio. They may also express the comparison by the time gained or lost without symptoms through a treatment. Or, researchers can use graphics that depict the benefit-harm comparison for patients at different outcome risks or based on other patient characteristics.
In Table Table2,2, we compare each of the 16 approaches against these key characteristics. Table Table33 provides a more detailed discussion of each approach. Three approaches require individual patient data (Table (Table2)2) whereas 13 approaches do not require individual patient data. This means that, although most quantitative approaches were developed for use in randomized trials and observational studies, researchers can also use most of these approaches at the synthesis stage (systematic reviews). Fourteen of the approaches are data driven but four of them may also use simulation. One approach (Probabilistic Simulation Methods [PSM]) is entirely based on simulation. Twelve of the 16 approaches put benefit and harm outcomes on the same scale to provide a benefit and harm comparison metric. Only four approaches provide measures of uncertainty around the benefit and harm comparison metric. Four approaches could consider the joint distribution of benefit and harm outcomes for the estimation of uncertainty. But many examples using these four approaches do not consider the dependence between benefit and harm, but only consider their marginal distributions (i.e. consider them to be independent,). Five approaches use composites outcome for benefit and composite outcomes for harm while 12 approaches use multiple outcomes. Researchers have adapted or potentially could adapt nine of the 16 approaches to incorporate patient preferences.
The main finding of our review of quantitative approaches for benefit and harm assessment used in the medical literature is a simple algorithm that categorizes existing quantitative approaches broadly into approaches that consider single or multiple benefit and harm outcomes and into approaches that use a benefit-harm comparison metric or present outcomes side by side. We also found that for most approaches, researchers use aggregate data so as to make the approaches suitable for systematic reviews even if that is not their intended purpose. Interestingly, only few approaches provide measures of uncertainty and none of the approaches considers a potential correlation between benefit and harm outcomes (joint distribution).
We identified a number of assumptions that researchers make when applying some of the quantitative approaches: First, for some approaches researchers assume that one or more benefit and harm outcomes can be put on the same scale to calculate a benefit and harm comparison metric. Challenges for putting different outcomes on the same scale include their relative importance to decisionmakers, simplification of the outcomes (e.g. dichotomizing continuous outcomes, which may lead to substantial loss of information), or different methods and timing in the ascertainment of different outcomes.
However, the advantages of a benefit-harm comparison metric may be substantial, for example, in the context of complex situations where multiple outcomes are important and where patient, provider, and policymaker preferences vary . It is a great cognitive challenge to process such a multidimensional task without a benefit-harm comparison metric. The major advantage of using a benefit-harm comparison metric (over using an approach without such a common metric) is that it can make explicit assumptions about the relative importance of outcomes or the arbitrary selection of the evidence on benefits and harms or on baseline risks, and that sensitivity analyses can provide evidence as to how the benefit-harm comparison changes if different assumptions are made. Also, a single number may provide some advantages for the communication of benefit-harm comparison to patients because it avoids overwhelming the patients with data on multiple different outcomes.
Second, we were surprised to see that there were no quantitative approaches that considered or even discussed the joint distribution of benefit and harm outcomes, even when individual patient data were available. The joint distribution describes the correlation between benefit and harm outcomes. Trial reports commonly describe standard errors and confidence intervals for the benefit and harm outcomes separately, but rarely describe the joint distribution of the effects of the treatment on the benefit and harm outcomes. Without the joint distribution of all the effects, we have to assume independence of the benefit and harm effects. This may not yield a valid estimate of the uncertainty of the benefit-harm balance metric. Changes in reporting practices, such as online journal appendix materials or online repositories of covariance data for later data synthesis, could address this limitation. Systematic reviewers should keep in mind the limitation of not considering the joint distribution when interpreting results from a quantitative benefit and harm assessment.
The figure and table show a number of characteristics that help distinguish various existing quantitative approaches. While these characteristics are important for the selection of an appropriate quantitative approach, there are a number of additional considerations that researchers need to make because they have implications regarding the type of evidence included in the benefit-harm assessment. For example, clinical trials are commonly designed to provide high-quality evidence and sufficient power for benefit outcomes. Harms often receive much less attention in terms of accurate and valid methods of measurement . Such asymmetry in the quality of outcome ascertainment affects the validity of a quantitative benefit and harm assessment, but it is yet unclear how to downgrade the quality of evidence for this reason.
In contrast to the framework developed here that focused entirely on the quantitative assessments, Lynd and others developed criteria that apply to the entire process of a benefit-harm assessment. This usually requires that researchers consider both quantitative and qualitative approaches to make conclusions regarding benefit-harm comparisons of health care interventions [10,11,41,42]. Lynd and others proposed 10 criteria for benefit-harm assessments--be universal, inclusive, comprehensive, patient-sensitive, easily interpreted consider preferences, define when benefits outweigh harms, incorporate uncertainty, be flexible and integrate economic evaluations) . We agree with these guiding principles but also think that researchers cannot readily use them to judge the adequacy of specific quantitative approaches. Whether or not a specific approach is adequate depends much on the type and quality of available data. Regulatory decisionmakers, guideline developers, or users of the evidence are likely to perceive the ease of use and ease of interpretation of quantitative approaches very differently because of different levels of methodological expertise or different perspectives. Therefore, we believe that our framework for organizing quantitative approaches is complementary to, rather than competing with, what Lynd and others have proposed. The frameworks proposed by our team, Lynd, and others support a systematic, well-structured, and transparent process for reducing the multidimensionality of a benefit-harm assessment.
Our review showed that current quantitative approaches for benefit-harm assessment might need some further development. Firstly, many quantitative approaches identified here focus on binary outcomes that occur just once, with or without consideration of time to event. Current methods need extensions that also consider different types of data. Some patient-important outcomes, such as quality of life or symptoms, cannot be expressed appropriately as binary outcomes without substantial loss of information. Some benefit and harm events can occur several times so that the number of events per person-time needs to be considered rather than the proportion of persons with at least one event. Secondly, uncertainty estimates for the benefit and harm comparison metric (e.g. 95 percent confidence or credible intervals) are likely to be of key importance for decisionmakers and organizations making treatment recommendations. Researchers do not commonly report estimates of uncertainty that arises from sampling variability. In addition, none of the methods considers the joint distribution of benefit and harm outcomes. Researchers should develop statistical methods for considering joint distributions when estimating standard errors for benefit-harm comparison metrics. For systematic reviews it would be valuable to develop approaches for making assumptions about joint distributions because covariance matrices are rarely available from reports of primary studies and it may be challenging to request them from authors of primary studies. Thirdly, researchers should develop systematic approaches for sensitivity analyses that assess the influence of the various assumptions commonly made. One approach would be to agree on a list of standard sensitivity analyses for key aspects of a benefit-harm assessment. For example, data for estimating baseline risks (e.g. probability of outcome without treatment) can come from different sources (e.g. surveillance data, observational studies, and placebo arms of randomized trials). The best available evidence on treatment effects may sometimes come from single randomized trial or observational study rather than from meta-analyses. Researchers may be able to derive patient preferences by different eliciting techniques. A systematic outline of these options (the choices for the primary analysis and for sensitivity analyses), would make benefit-harm assessments transparent and give users of the evidence a sense for how sensitive the results are to different assumptions.
A strength of our review is the collaborative effort of clinicians, epidemiologists, and statisticians that helped us to develop a comprehensive framework for characterizing quantitative approaches for benefit-harm assessment. Some may perceive it as a limitation that we did not conduct a separate formal systematic review but we capitalized on an existing, recent review . Also, we used an iterative approach of developing a framework rather than following a more standardized approach, such as Delphi-like procedures, to identify important characteristics of quantitative approaches for benefit-harm assessment. However, a more standardized approach also has its limitations because it does not allow discussing intertwined issues or considering different perspectives of an interdisciplinary research group in great depth.
We developed a framework for the use of quantitative approaches for benefit-harm assessment that can help researchers select specific approaches. We do not make recommendations for or against specific approaches. It is too early to make such recommendations because of the lack of evidence from studies that directly compare quantitative approaches applied to a specific question. The adequacy of approaches depends on the specific benefit-harm question and on the amount and quality of data that determine how justifiable certain assumptions are. In some situations, there may be a single approach that appears to be most appropriate. But commonly, there will be several approaches that are reasonable options given the question, the goal of the benefit-harm assessment, and the available data. In such situations, we suggest that investigators use several approaches, as commonly used in other areas [43,44], which acknowledges that none of them is perfect and based on some assumptions. The confidence in the results of benefit-harm assessments then depends on the extent to which different approaches arrive at similar results, and how useful they are to end-users. Evidence from studies applying multiple approaches to the same benefit-harm question, together with recognition of their advantages and disadvantages, would make it possible to identify approaches that are consistently superior over others, and to develop recommendations for specific approaches.
The authors declare that they have no competing interests.
All authors read and approved the final manuscript.
This work was funded by a contract to AHRQ, Rockville, MD (Contract No. HHSA 290-2007-10061-I). The findings and conclusions in this document are those of the author(s), who are responsible for its content, and do not necessarily represent the views of AHRQ. No statement in this report should be construed as an official position of AHRQ or of the U.S. Department of Health and Human Services.
Dr. Boyd was supported the Paul Beeson Career Development Award Program (NIA K23 AG032910, AFAR, The John A. Hartford Foundation, The Atlantic Philanthropies, The Starr Foundation and an anonymous donor).
The pre-publication history for this paper can be accessed here: