Overview and general framework for modeling biomarker utility

Decision analysis modeling is used to simulate the downstream consequences of a clinical decision^{54} and can be used to estimate the health impact of biomarker strategies. Decision analysis is often used to simulate both health outcomes and economic outcomes. When the primary outcome of a decision analytic model is health impact, we might call this “comparative effectiveness modeling”; when the primary outcome is cost-effectiveness, this type of modeling is called cost-effectiveness modeling (or “cost-utility modeling” if the outcome is specifically formulated as cost/QALY), and can be used to describe the efficiency of biomarker strategies.

Decision analysis modeling is much less expensive and time consuming than conducting a clinical trial. Unlike randomized trials where one “best” strategy must be chosen a priori for testing, modeling studies can be used to compare systematically the effectiveness and efficiency of all reasonable strategies in all relevant subgroups. Modeling also allows the investigator to synthesize all available data on test characteristics, treatment efficacy and other relevant parameters including data on costs and long-term effects from testing and treatment, and identify crucial areas of uncertainty in existing data where more primary data collection is required.

Decision analysis modeling is designed to capture and weigh the tradeoffs inherent in any decision. The modeling approach described below and in provides a general framework for capturing the essential tradeoffs inherent in the decision of whether or not to measure a biomarker. We will focus here on use of biomarkers for making better clinical decisions (, mechanism 3). In Online

Supplemental Materials, we illustrate how a published decision analysis that evaluates cost-effectiveness of C-reactive protein as a screening tool for guiding statin therapy would fit into this framework (see

Supplemental Figure 1, showing how can be adapted for this specific analysis), and provide a brief critique touching on the methodologic points discussed below.

Defining scenarios

Each scenario in a decision analysis should be defined narrowly, such that a single treatment strategy would be clinically reasonable in the absence of the biomarker result. Making the scenario narrow allows the resulting estimate to represent a relatively homogeneous effect that is easy to translate into practice. Because the marginal cost of modeling additional scenarios is relatively low (a key benefit of modeling compared with clinical studies), multiple scenarios can be considered; results can then be presented separately for each scenario, or integrated carefully across some population of interest if an average effect is desired for policymaking. For example, The CHD Policy Model, an established decision analytic model, automatically runs in parallel thousands of scenarios that, when combined, produce estimates that are representative of the US population age 35-85^{55}.

Simulating the full range of possible strategies

Just as a receiver-operator characteristic (ROC) curve is always bounded by two extremes (sensitivity 100%/specificity 0% and sensitivity 0%/specificity 100%), it is useful to model the two logically extreme strategies when modeling biomarker utility: “Treat none” and “Treat all”. While one of these strategies may seem unrealistic in any given scenario, both strategies are clinically feasible to pursue without incurring the cost/harm of the biomarker test, “Test-and-Treat” strategies must compete against both extremes, and sometimes either extreme may be rationally preferred depending on society's willingness to pay. Furthermore, comparing these extremes to each other provides an estimate of *treatment* effectiveness and efficiency within the biomarker model, and a means of validating the model against prior analyses.

Dividing into “sub-scenarios” with differing levels of the biomarker

We assume that any given scenario (even if narrowly defined) consists of a mix of persons with differing levels of the biomarker of interest (“sub-scenarios”). Although only the Test-and-Treat strategies (S_{2} and S_{3} in ) will use the biomarker measurement, the sub-scenarios based on the biomarker distribution (along with potentially differing post-test risk estimates and treatment effects) should be modeled identically in all strategies. This ensures that all intervention simulations for that scenario (i.e, S_{1}-S_{4}) are equivalent in all aspects except for key tradeoffs related to testing and treatment (). In different scenarios, however, biomarker distribution can be very different (e.g., coronary calcium is more common in older men than in younger women), and should be modeled as such in order to simulate realistic reclassification rates.

illustrates a 3-category approach to modeling the distribution of the biomarker based on critical test thresholds (T_{1} and T_{2}). How many categories and what specific thresholds are modeled depends on the particulars of the test and clinical setting. For example, the investigator might use natural thresholds when the test result is naturally categorized (e.g., “low, high, or intermediate probability” scans for ventilation/perfusion scanning for pulmonary embolism^{17}), thresholds used in prior studies (e.g., coronary calcium thresholds of 0, 100, and 300, as used in a key article^{22}), or biomarker levels that would lead to a “post-test” risk, in the given scenario, that is over some established treatment threshold (e.g., post-test 10-year CHD risk > 20% for statin treatment^{43}).

Modeling the post-test risk of events and effects of treatment

The only way a biomarker may be useful for clinical decision-making is if the expected benefits of some treatment are different for different levels of the biomarker, and this must be modeled explicitly for different sub-scenarios. The expected benefit of treatment will be larger for persons with biomarker results indicating a higher risk of disease (assuming relative risk reduction from the treatment is constant), and in persons where a biomarker result indicates higher treatment effectiveness (i.e, larger relative risk reduction). Modeling the expected benefits of treatment, therefore, requires 1) estimating post-test disease risk, and 2) applying treatment effectiveness (relative risk reduction) for each sub-scenario.

Modification of post-test risk comprises a key mechanism by which biomarker measurement may provide clinical utility; but calculation of post-test risk in different sub-scenarios is not straightforward. For example, note that a coronary calcium score of 50 may lead to a downward revision of risk in one patient (if it was lower than expected, as in the case of a 70 year-old man) and an upward revision of risk in another (if it was higher than expected, as in the case of a 55 year-old woman)^{56}. Methods are available for estimating post-test risk while maintaining the average event rate by integrating evidence about biomarker distribution (“expectation”) with the relative risk estimates associated with different levels of a biomarker^{56}. Alternately, direct estimates of risk from follow-up studies may be available for persons who are reclassified upwards or downwards by measurement of a biomarker^{29, 30}. Either way, post-test risk estimates should be handled carefully and realistically, and should be based on data-driven biomarker performance estimates from studies that use “real-world” biomarker measurements that take into account measurement variability (from biological variability and measurement error).

Treatment effectiveness, in terms of relative risk reduction, is usually assumed to be constant across different persons, but this may not always be the case. The relative risk reduction for statin therapy, for example, may be larger for persons with a high C-reactive protein level than persons with a lower level^{32}. Similarly, the risk of adverse effects from statins may vary depending on genetic factors^{23}. Sub-scenario-specific treatment effectiveness is then applied to sub-scenario-specific disease risk to simulate outcome rates for each sub-scenario ().

Modeling outcomes and estimating incremental differences between strategies

Clinical and economic outcomes are then modeled using these sub-scenario-specific risk estimates using standard decision analysis techniques. Simple event probabilities can be used to ramify all possible combinations of relevant events, each represented by a “terminal node”. For example, a single terminal node might represent the unlucky occurrence of both the outcome of interest and an adverse effect from treatment: “statin-induced myopathy + non-fatal MI”. For each terminal node, the overall probability of occurring is calculated, along with an overall estimate of “utility” (in QALYs or another measure of health impact, see ) and costs (if relevant). For long-term scenarios, a standard approach is to use a Markov modeling process, which simulates cycles during which persons may transition between different clinical states (e.g., healthy, status/post myocardial infarction, dead, etc), with QALYs or other outcomes accruing during each cycle at different rates for patients in different states^{57}. Either way, outcomes (utility +/- costs) are then summed across all possible terminal nodes or Markov cycles/states for each strategy, weighting by their probability of occurrence, to obtain an estimate of the average expected outcomes associated with any given strategy (). For presentation purposes, different strategies are then compared by calculating the difference in average clinical utility between strategies (incremental effectiveness). For cost-effectiveness analysis, the incremental cost is also estimated, and the ratio of incremental cost to incremental effectiveness (incremental cost-effectiveness ratio, usually in $/QALY) is presented. Excellent, practical advice on designing and implementing decision analysis is available and directly applicable to biomarker modeling^{35, 54, 57-60}.

Limitations of modeling and use of sensitivity analyses

The major limitation of modeling is that data are not always available to support the many assumption parameters required in the construction of the model. Typically, the modeler can find direct evidence to support estimates for some model parameters, indirect evidence for others, and must simply guess (using clinical judgment, etc) for the rest. Even when parameter estimates are based on good scientific evidence, they are associated with some uncertainty (from sampling error +/- bias).

Any modeling exercise, therefore, must be accompanied by a series of well-designed sensitivity analyses to see how this uncertainty could affect the results. In the “base case”, the modeler uses best guess estimates for all assumption parameters and produces a set of base case results; in sensitivity analyses, one or more of the assumption parameters are varied in order to evaluate how “sensitive” results are to variation in the parameter(s). By this method, the modeler can describe which parameter assumptions are important in estimating the effectiveness or cost-effectiveness of a strategy. For example, in a cost-effectiveness analysis of statin prescribing strategies, results were relatively insensitive to reasonable variation in the rate of myopathy and hepatitis, but were sensitive to an average decrement in quality of life from taking a pill every day^{47}. When results are critically sensitive to a parameter estimate that is not supported by firm evidence, a good case can be made for further empirical study. Probabilistic sensitivity analysis, where the model is iterated many times varying all parameters simultaneously (drawing from a theoretical distributions for each variable), can be used to estimate global uncertainty for any model result^{61}.

The impact of model structure is more difficult to evaluate. While numerical assumption parameters are easy to vary, the structure of a model is usually fixed, and implications of the modeler's decisions about how to simulate occurrence of clinical outcomes, for example, may be hard to assess. With any model, a balance must be struck between realism and simplicity; a model that is too simple may not capture the relevant effects, but a “black box” model that is too complex may be difficult to understand and troubleshoot, and may even obscure the essential tradeoff. Finding the right balance in structuring the model that captures the essential tradeoffs accurately and then designing an appropriate set of sensitivity analyses that bring to light the important assumptions are the key to deriving useful information from decision analysis modeling.

The use of a common, interpretable health impact metric (QALYs) is a strength of modeling, but it does assume a utilitarian philosophy. This has important implications. Net QALY impact may be positive if it results in a tiny benefit in many persons even if it results is substantial harm in a small number of persons. If a disadvantaged population is disproportionately represented in the harmed minority, for example, disparities in health may widen even while average population health improves. Similarly, QALY modeling implies that saving the life of a younger person (who will subsequently accrue more QALYs) is more valuable than saving the life of an older person. Furthermore, there is no consensus about the value of a QALY (and therefore no consensus about the threshold $/QALY below which an intervention is deemed “cost-effective), though presenting the $/QALY metric does allow the reader to decide for themselves. These and other limitations must be considered any time that QALYs are used as a measure of health impact^{36}.