Decision curve analysis (DCA) has been proposed as an alternative method for evaluation of diagnostic tests, prediction models, and molecular markers. However, DCA is based on expected utility theory, which has been routinely violated by decision makers. Decision-making is governed by intuition (system 1), and analytical, deliberative process (system 2), thus, rational decision-making should reflect both formal principles of rationality and intuition about good decisions. We use the cognitive emotion of regret to serve as a link between systems 1 and 2 and to reformulate DCA.
First, we analysed a classic decision tree describing three decision alternatives: treat, do not treat, and treat or no treat based on a predictive model. We then computed the expected regret for each of these alternatives as the difference between the utility of the action taken and the utility of the action that, in retrospect, should have been taken. For any pair of strategies, we measure the difference in net expected regret. Finally, we employ the concept of acceptable regret to identify the circumstances under which a potentially wrong strategy is tolerable to a decision-maker.
We developed a novel dual visual analog scale to describe the relationship between regret associated with "omissions" (e.g. failure to treat) vs. "commissions" (e.g. treating unnecessary) and decision maker's preferences as expressed in terms of threshold probability. We then proved that the Net Expected Regret Difference, first presented in this paper, is equivalent to net benefits as described in the original DCA. Based on the concept of acceptable regret we identified the circumstances under which a decision maker tolerates a potentially wrong decision and expressed it in terms of probability of disease.
We present a novel method for eliciting decision maker's preferences and an alternative derivation of DCA based on regret theory. Our approach may be intuitively more appealing to a decision-maker, particularly in those clinical situations when the best management option is the one associated with the least amount of regret (e.g. diagnosis and treatment of advanced cancer, etc).
Diagnostic and prognostic models are typically evaluated with measures of accuracy that do not address clinical consequences. Decision-analytic techniques allow assessment of clinical outcomes, but often require collection of additional information, and may be cumbersome to apply to models that yield a continuous result. We sought a method for evaluating and comparing prediction models that incorporates clinical consequences, requires only the dataset on which the models are tested, and can be applied to models that have either continuous or dichotomous results.
We describe decision curve analysis, a simple, novel method of evaluating predictive models. We start by assuming that the threshold probability of a disease or event at which a patient would opt for treatment is informative of how the patient weighs the relative harms of a false-positive and a false-negative prediction. This theoretical relationship is then used to derive the net benefit of the model across different threshold probabilities. Plotting net benefit against threshold probability yields the “decision curve”. We apply the method to models for the prediction of seminal vesicle invasion in prostate cancer patients. Decision curve analysis identified the range of threshold probabilities in which a model was of value, the magnitude of benefit, and which of several models was optimal.
Decision curve analysis is a suitable method for evaluating alternative diagnostic and prognostic strategies that has advantages over other commonly used measures and techniques.
prediction models; multivariate analysis; decision analysis
The traditional statistical approach to the evaluation of diagnostic tests, prediction models and molecular markers is to assess their accuracy, using metrics such as sensitivity, specificity and the receiver-operating-characteristic curve. However, there is no obvious association between accuracy and clinical value: it is unclear, for example, just how accurate a test needs to be in order for it to be considered "accurate enough" to warrant its use in patient care. Decision analysis aims to assess the clinical value of a test by assigning weights to each possible consequence. These methods have been historically considered unattractive to the practicing biostatistician because additional data from the literature, or subjective assessments from individual patients or clinicians, are needed in order to assign weights appropriately. Decision analytic methods are available that can reduce these additional requirements. These methods can provide insight into the consequences of using a test, model or marker in clinical practice.
Syndromic surveillance systems can potentially be used to detect a bioterrorist attack earlier than traditional surveillance, by virtue of their near real-time analysis of relevant data. Receiver operator characteristic (ROC) curve analysis using the area under the curve (AUC) as a comparison metric has been recommended as a practical evaluation tool for syndromic surveillance systems, yet traditional ROC curves do not account for timeliness of detection or subsequent time-dependent health outcomes.
Using a decision-analytic approach, we predicted outcomes, measured in lives, quality adjusted life years (QALYs), and costs, for a series of simulated bioterrorist attacks. We then evaluated seven detection algorithms applied to syndromic surveillance data using outcomes-weighted ROC curves compared to simple ROC curves and timeliness-weighted ROC curves. We performed sensitivity analyses by varying the model inputs between best and worst case scenarios and by applying different methods of AUC calculation.
The decision analytic model results indicate that if a surveillance system was successful in detecting an attack, and measures were immediately taken to deliver treatment to the population, the lives, QALYs and dollars lost could be reduced considerably. The ROC curve analysis shows that the incorporation of outcomes into the evaluation metric has an important effect on the apparent performance of the surveillance systems. The relative order of performance is also heavily dependent on the choice of AUC calculation method.
This study demonstrates the importance of accounting for mortality, morbidity and costs in the evaluation of syndromic surveillance systems. Incorporating these outcomes into the ROC curve analysis allows for more accurate identification of the optimal method for signaling a possible bioterrorist attack. In addition, the parameters used to construct an ROC curve should be given careful consideration.
The predictiveness curve is a graphical tool that characterizes the population distribution of Risk(Y) = P(D = 1|Y), where D denotes a binary outcome such as occurrence of an event within a specified time period and Y denotes predictors. A wider distribution of Risk(Y) indicates better performance of a risk model in the sense that making treatment recommendations is easier for more subjects. Decisions are more straightforward when a subject's risk is deemed to be high or low. Methods have been developed to estimate predictiveness curves from cohort studies. However early phase studies to evaluate novel risk prediction markers typically employ case-control designs. Here we present semiparametric and nonparametric methods for evaluating a continuous risk prediction marker that accommodate case-control data. Small sample properties are investigated through simulation studies. The semiparametric methods are substantially more efficient than their nonparametric counterparts under a correctly specified model. We generalize them to settings where multiple prediction markers are involved. Applications to prostate cancer risk prediction markers illustrate methods for comparing the risk prediction capacities of markers and for evaluating the increment in performance gained by adding a marker to a baseline risk model. We propose a modified Hosmer-Lemeshow test for case-control study data to assess calibration of the risk model that is a natural complement to this graphical tool.
biomarker; case-control study; classification; Hosmer-Lemeshow test; predictiveness curve; risk; ROC curve
Risk prediction models based on medical history or results of tests are increasingly common in the cancer literature. An important use of these models is to make treatment decisions on the basis of estimated risk. The relative utility curve is a simple method for evaluating risk prediction in a medical decision-making framework. Relative utility curves have three attractive features for the evaluation of risk prediction models. First, they put risk prediction into perspective because relative utility is the fraction of the expected utility of perfect prediction obtained by the risk prediction model at the optimal cut point. Second, they do not require precise specification of harms and benefits because relative utility is plotted against a summary measure of harms and benefits (ie, the risk threshold). Third, they are easy to compute from standard tables of data found in many articles on risk prediction. An important use of relative utility curves is to evaluate the addition of a risk factor to the risk prediction model. To illustrate an application of relative utility curves, an analysis was performed on previously published data involving the addition of breast density to a risk prediction model for invasive breast cancer.
Proper evaluation of new diagnostic tests is required to reduce overutilization and to limit potential negative health effects and costs related to testing. A decision analytic modelling approach may be worthwhile when a diagnostic randomized controlled trial is not feasible. We demonstrate this by assessing the cost-effectiveness of modified transesophageal echocardiography (TEE) compared with manual palpation for the detection of atherosclerosis in the ascending aorta.
Based on a previous diagnostic accuracy study, actual Dutch reimbursement data, and evidence from literature we developed a Markov decision analytic model. Cost-effectiveness of modified TEE was assessed for a life time horizon and a health care perspective. Prevalence rates of atherosclerosis were age-dependent and low as well as high rates were applied. Probabilistic sensitivity analysis was applied.
The model synthesized all available evidence on the risk of stroke in cardiac surgery patients. The modified TEE strategy consistently resulted in more adapted surgical procedures and, hence, a lower risk of stroke and a slightly higher number of life-years. With 10% prevalence of atherosclerosis the incremental cost-effectiveness ratio was €4,651 and €481 per quality-adjusted life year in 55-year-old men and women, respectively. In all patients aged 65 years or older the modified TEE strategy was cost saving and resulted in additional health benefits.
Decision analytic modelling to assess the cost-effectiveness of a new diagnostic test based on characteristics, costs and effects of the test itself and of the subsequent treatment options is both feasible and valuable. Our case study on modified TEE suggests that it may reduce the risk of stroke in cardiac surgery patients older than 55 years at acceptable cost-effectiveness levels.
Diagnostic test; Patient outcomes; Cost-effectiveness analysis; Stroke; Cardiac surgery
Decision-making in healthcare is complex. Research on coverage decision-making has focused on comparative studies for several countries, statistical analyses for single decision-makers, the decision outcome and appraisal criteria. Accounting for decision processes extends the complexity, as they are multidimensional and process elements need to be regarded as latent constructs (composites) that are not observed directly. The objective of this study was to present a practical application of partial least square path modelling (PLS-PM) to evaluate how it offers a method for empirical analysis of decision-making in healthcare.
Empirical approaches that applied PLS-PM to decision-making in healthcare were identified through a systematic literature search. PLS-PM was used as an estimation technique for a structural equation model that specified hypotheses between the components of decision processes and the reasonableness of decision-making in terms of medical, economic and other ethical criteria. The model was estimated for a sample of 55 coverage decisions on the extension of newborn screening programmes in Europe. Results were evaluated by standard reliability and validity measures for PLS-PM.
After modification by dropping two indicators that showed poor measures in the measurement models’ quality assessment and were not meaningful for newborn screening, the structural equation model estimation produced plausible results. The presence of three influences was supported: the links between both stakeholder participation or transparency and the reasonableness of decision-making; and the effect of transparency on the degree of scientific rigour of assessment. Reliable and valid measurement models were obtained to describe the composites of ‘transparency’, ‘participation’, ‘scientific rigour’ and ‘reasonableness’.
The structural equation model was among the first applications of PLS-PM to coverage decision-making. It allowed testing of hypotheses in situations where there are links between several non-observable constructs. PLS-PM was compatible in accounting for the complexity of coverage decisions to obtain a more realistic perspective for empirical analysis. The model specification can be used for hypothesis testing by using larger sample sizes and for data in the full domain of health technologies.
PLS; Structural equation modelling; Quantitative research; Feasibility study; Model evaluation; Non-parametric; Fourth hurdle; Reimbursement; Neonatal; Europe
The performance of prediction models can be assessed using a variety of different methods and metrics. Traditional measures for binary and survival outcomes include the Brier score to indicate overall model performance, the concordance (or c) statistic for discriminative ability (or area under the receiver operating characteristic (ROC) curve), and goodness-of-fit statistics for calibration.
Several new measures have recently been proposed that can be seen as refinements of discrimination measures, including variants of the c statistic for survival, reclassification tables, net reclassification improvement (NRI), and integrated discrimination improvement (IDI). Moreover, decision–analytic measures have been proposed, including decision curves to plot the net benefit achieved by making decisions based on model predictions.
We aimed to define the role of these relatively novel approaches in the evaluation of the performance of prediction models. For illustration we present a case study of predicting the presence of residual tumor versus benign tissue in patients with testicular cancer (n=544 for model development, n=273 for external validation).
We suggest that reporting discrimination and calibration will always be important for a prediction model. Decision-analytic measures should be reported if the predictive model is to be used for making clinical decisions. Other measures of performance may be warranted in specific applications, such as reclassification metrics to gain insight into the value of adding a novel predictor to an established model.
Decision curve analysis has been introduced as a method to evaluate prediction models in terms of their clinical consequences if used for a binary classification of subjects into a group who should and into a group who should not be treated. The key concept for this type of evaluation is the "net benefit", a concept borrowed from utility theory.
We recall the foundations of decision curve analysis and discuss some new aspects. First, we stress the formal distinction between the net benefit for the treated and for the untreated and define the concept of the "overall net benefit". Next, we revisit the important distinction between the concept of accuracy, as typically assessed using the Youden index and a receiver operating characteristic (ROC) analysis, and the concept of utility of a prediction model, as assessed using decision curve analysis. Finally, we provide an explicit implementation of decision curve analysis to be applied in the context of case-control studies.
We show that the overall net benefit, which combines the net benefit for the treated and the untreated, is a natural alternative to the benefit achieved by a model, being invariant with respect to the coding of the outcome, and conveying a more comprehensive picture of the situation. Further, within the framework of decision curve analysis, we illustrate the important difference between the accuracy and the utility of a model, demonstrating how poor an accurate model may be in terms of its net benefit. Eventually, we expose that the application of decision curve analysis to case-control studies, where an accurate estimate of the true prevalence of a disease cannot be obtained from the data, is achieved with a few modifications to the original calculation procedure.
We present several interrelated extensions to decision curve analysis that will both facilitate its interpretation and broaden its potential area of application.
Clinicians face an increasing volume of biomedical data. Assessing the efficacy of systems that enable accurate and timely clinical decision making merits corresponding attention. This paper discusses the multiple-reader multiple-case (MRMC) experimental design and linear mixed models as means of assessing and comparing decision accuracy and latency (time) for decision tasks in which clinician readers must interpret visual displays of data. These tools can assess and compare decision accuracy and latency (time). These experimental and statistical techniques, used extensively in radiology imaging studies, offer a number of practical and analytic advantages over more traditional quantitative methods such as percent-correct measurements and ANOVAs, and are recommended for their statistical efficiency and generalizability. An example analysis using readily available, free, and commercial statistical software is provided as an appendix. While these techniques are not appropriate for all evaluation questions, they can provide a valuable addition to the evaluative toolkit of medical informatics research.
data visualization; data display; clinical decision support systems; statistical data analysis; ROC curve; MRMC analysis; mixed models; evaluation
Computerized clinical decision support systems are information technology-based systems designed to improve clinical decision-making. As with any healthcare intervention with claims to improve process of care or patient outcomes, decision support systems should be rigorously evaluated before widespread dissemination into clinical practice. Engaging healthcare providers and managers in the review process may facilitate knowledge translation and uptake. The objective of this research was to form a partnership of healthcare providers, managers, and researchers to review randomized controlled trials assessing the effects of computerized decision support for six clinical application areas: primary preventive care, therapeutic drug monitoring and dosing, drug prescribing, chronic disease management, diagnostic test ordering and interpretation, and acute care management; and to identify study characteristics that predict benefit.
The review was undertaken by the Health Information Research Unit, McMaster University, in partnership with Hamilton Health Sciences, the Hamilton, Niagara, Haldimand, and Brant Local Health Integration Network, and pertinent healthcare service teams. Following agreement on information needs and interests with decision-makers, our earlier systematic review was updated by searching Medline, EMBASE, EBM Review databases, and Inspec, and reviewing reference lists through 6 January 2010. Data extraction items were expanded according to input from decision-makers. Authors of primary studies were contacted to confirm data and to provide additional information. Eligible trials were organized according to clinical area of application. We included randomized controlled trials that evaluated the effect on practitioner performance or patient outcomes of patient care provided with a computerized clinical decision support system compared with patient care without such a system.
Data will be summarized using descriptive summary measures, including proportions for categorical variables and means for continuous variables. Univariable and multivariable logistic regression models will be used to investigate associations between outcomes of interest and study specific covariates. When reporting results from individual studies, we will cite the measures of association and p-values reported in the studies. If appropriate for groups of studies with similar features, we will conduct meta-analyses.
A decision-maker-researcher partnership provides a model for systematic reviews that may foster knowledge translation and uptake.
Gene-expression signature-based disease classification and clinical outcome prediction has not been widely introduced in clinical medicine as initially expected, mainly due to the lack of extensive validation needed for its clinical deployment. Obstacles include variable measurement in microarray assay, inconsistent assay platform, analytical requirement for comparable pair of training and test datasets, etc. Furthermore, as medical device helping clinical decision making, the prediction needs to be made for each single patient with a measure of its reliability. To address these issues, there is a need for flexible prediction method less sensitive to difference in experimental and analytical conditions, applicable to each single patient, and providing measure of prediction confidence. The nearest template prediction (NTP) method provides a convenient way to make class prediction with assessment of prediction confidence computed in each single patient's gene-expression data using only a list of signature genes and a test dataset. We demonstrate that the method can be flexibly applied to cross-platform, cross-species, and multiclass predictions without any optimization of analysis parameters.
We present a novel machine learning approach for the classification of cancer samples using expression data. We refer to the method as “decision trunks,” since it is loosely based on decision trees, but contains several modifications designed to achieve an algorithm that: (1) produces smaller and more easily interpretable classifiers than decision trees; (2) is more robust in varying application scenarios; and (3) achieves higher classification accuracy. The decision trunk algorithm has been implemented and tested on 26 classification tasks, covering a wide range of cancer forms, experimental methods, and classification scenarios. This comprehensive evaluation indicates that the proposed algorithm performs at least as well as the current state of the art algorithms in terms of accuracy, while producing classifiers that include on average only 2–3 markers. We suggest that the resulting decision trunks have clear advantages over other classifiers due to their transparency, interpretability, and their correspondence with human decision-making and clinical testing practices.
classification; machine learning; gene expression; biomarkers
Classification methods are widely used for identifying underlying groupings within datasets and predicting the class for new data objects given a trained classifier. This study introduces a project aimed at using a combination of simulations and classification techniques to predict epidemic curves and infer underlying disease parameters for an ongoing outbreak.
Six supervised classification methods (random forest, support vector machines, nearest neighbor with three decision rules, linear and flexible discriminant analysis) were used in identifying partial epidemic curves from six agent-based stochastic simulations of influenza epidemics. The accuracy of the methods was compared using a performance metric based on the McNemar test.
The findings showed that: (1) assumptions made by the methods regarding the structure of an epidemic curve influences their performance i.e. methods with fewer assumptions perform best, (2) the performance of most methods is consistent across different individual-based networks for Seattle, Los Angeles and New York and (3) combining classifiers using a weighting approach does not guarantee better prediction.
epidemic curves; supervised learning; agent-based epidemic models; classification; random forest
Although a fully general extension of ROC analysis to classification tasks with more than two classes has yet to be developed, the potential benefits to be gained from a practical performance evaluation methodology for classification tasks with three classes have motivated a number of research groups to propose methods based on constrained or simplified observer or data models. Here we consider an ideal observer in a task with underlying data drawn from three univariate normal distributions. We investigate the behavior of the resulting ideal observer’s decision variables and ROC surface. In particular, we show that the pair of ideal observer decision variables is constrained to a parametric curve in two-dimensional likelihood ratio space, and that the decision boundary line segments used by the ideal observer can intersect this curve in at most six places. From this, we further show that the resulting ROC surface has at most four degrees of freedom at any point, and not the five that would be required, in general, for a surface in a six-dimensional space to be non-degenerate. In light of the difficulties we have previously pointed out in generalizing the well-known area under the ROC curve performance metric to tasks with three or more classes, the problem of developing a suitable and fully general performance metric for classification tasks with three or more classes remains unsolved.
ROC analysis; three-class classification; ideal observer decision rules
New diagnostic procedures and prognostic markers are continually being developed for a wide range of medical complaints. Medical institutions are therefore regularly faced with the decision as to whether to replace an existing procedure with a new one. The decision to adopt a new method is primarily based on diagnostic/predictive accuracy and cost-effectiveness, but this trade-off is not usually considered in a formal decision-theoretic way. The decision process for diagnostic procedures is complicated by the fact that diagnostic decisions are typically based on thresholding one or more continuous variables. Therefore, a formal decision process should account for uncertainty in the optimal threshold value for each diagnostic procedure. We here propose a Bayesian decision approach based on maximizing expected utility (incorporating accuracy and costs) with respect to diagnostic procedure and threshold level simultaneously. The Bayesian decision approach is illustrated via an application comparing the utility of different bone mineral density (BMD) measurements for determining the need for preventative treatment of osteoporotic hip fracture in elderly patients.
Bayesian decision analysis; decision theory; diagnostic methods
The decision environment for cancer care is becoming increasingly complex due to the discovery and development of novel genomic tests that offer information regarding therapy response, prognosis and monitoring, in addition to traditional histopathology. There is, therefore, a need for translational clinical tools based on molecular bioinformatics, particularly in current cancer care, that can acquire, analyze the data, and interpret and present information from multiple diagnostic modalities to help the clinician make effective decisions.
We present a platform for molecular signature discovery and clinical decision support that relies on genomic and epigenomic measurement modalities as well as clinical parameters such as histopathological results and survival information. Our Physician Accessible Preclinical Analytics Application (PAPAyA) integrates a powerful set of statistical and machine learning tools that leverage the connections among the different modalities. It is easily extendable and reconfigurable to support integration of existing research methods and tools into powerful data analysis and interpretation pipelines. A current configuration of PAPAyA with examples of its performance on breast cancer molecular profiles is used to present the platform in action.
PAPAyA enables analysis of data from (pre)clinical studies, formulation of new clinical hypotheses, and facilitates clinical decision support by abstracting molecular profiles for clinicians.
OBJECTIVE: To evaluate the performance of a computerized decision support system that combines two different decision support methodologies (a Bayesian network and a natural language understanding system) for the diagnosis of patients with pneumonia. DESIGN: Evaluation study using data from a prospective, clinical study. PATIENTS: All patients 18 years and older who presented to the emergency department of a tertiary care setting and whose chest x-ray report was available during the encounter. METHODS: The computerized decision support system calculated a probability of pneumonia using information provided by the two systems. Outcome measures were the area under the receiver operating characteristic curve, sensitivity, specificity, predictive values, likelihood ratios, and test effectiveness. RESULTS: During the 3-month study period there were 742 patients (45 with pneumonia). The area under the receiver operating characteristic curve was 0.881 (95% CI: 0.822, 0.925) for the Bayesian network alone and 0.916 (95% CI: 0.869, 0.949) for the Bayesian network combined with the natural language understanding system (p=0.01). CONCLUSION: Combining decision support methodologies that process information stored in different data formats can increase the performance of a computerized decision support system.
Medical decisions are rarely made under conditions of complete certainty. In the past decade there has been a rapid growth of interest in formal methods for optimizing medical decisions under uncertainty (Fineberg, 1981; Lusted, 1981). Application of decision analytic methods requires physicians to make probability estimates about clinical events for which extensive data are not available. This paper describes a computer program to train physicians to be better probability estimators — to make probability estimates that are numerically meaningful for use in formal decision analyses. It is designed to be a stand-alone application requiring about 2 hours of physician time. Use requires an IBM-PC or compatible microcomputer with graphics adaptor and monitor, and 8087 coprocessor.
Predictive models are often constructed from clinical databases with the
goal of eventually helping make better clinical decisions. Evaluating
models using decision theory is therefore natural. When constructing
a model using statistical and machine learning methods, however, we are
often uncertain about precisely how a model will be used. Thus, decision-independent
measures of classification performance, such as the
area under an ROC curve, are popular. As a complementary method of evaluation, we
investigate techniques for deriving the expected utility of
a model under uncertainty about the model's utilities. We demonstrate
an example of the application of this approach to the evaluation
of two models that diagnose coronary artery disease.
Free-response assessment of diagnostic systems continues to gain acceptance in areas related to the detection, localization and classification of one or more “abnormalities” within a subject. A Free-response Receiver Operating Characteristic (FROC) curve is a tool for characterizing the performance of a free-response system at all decision thresholds simultaneously. Although the importance of a single index summarizing the entire curve over all decision thresholds is well recognized in ROC analysis (e.g. area under the ROC curve), currently there is no widely accepted summary of a system being evaluated under the FROC paradigm. In this paper we propose a new index of the free-response performance at all decision thresholds simultaneously, and develop a nonparametric method for its analysis. Algebraically, the proposed summary index is the area under the empirical FROC curve penalized for the number of erroneous marks, rewarded for the fraction of detected abnormalities, and adjusted for the effect of the target size (or “acceptance radius”). Geometrically, the proposed index can be interpreted as a measure of average performance superiority over an artificial “guessing” free-response process and it represents an analogy to the area between the ROC curve and the “guessing” or diagonal line. We derive the ideal bootstrap estimator of the variance which can be used for a resampling-free construction of asymptotic bootstrap confidence intervals and for sample size estimation using standard expressions. The proposed procedure is free from any parametric assumptions and does not require an assumption of independence of observations within a subject. We provide an example with a dataset sampled from a diagnostic imaging study and conduct simulations which demonstrate the appropriateness of the developed procedure for the considered sample sizes and ranges of parameters.
Area under the FROC curve; Bootstrap; FROC; ROC
How do we use our memories of the past to guide decisions we've never had to make before? Although extensive work describes how the brain learns to repeat rewarded actions, decisions can also be influenced by associations between stimuli or events not directly involving reward — such as when planning routes using a cognitive map or chess moves using predicted countermoves — and these sorts of associations are critical when deciding among novel options. This process is known as model-based decision making. While the learning of environmental relations that might support model-based decisions is well studied, and separately this sort of information has been inferred to impact decisions, there is little evidence concerning the full cycle by which such associations are acquired and drive choices. Of particular interest is whether decisions are directly supported by the same mnemonic systems characterized for relational learning more generally, or instead rely on other, specialized representations. Here, building on our previous work, which isolated dual representations underlying sequential predictive learning, we directly demonstrate that one such representation, encoded by the hippocampal memory system and adjacent cortical structures, supports goal-directed decisions. Using interleaved learning and decision tasks, we monitor predictive learning directly and also trace its influence on decisions for reward. We quantitatively compare the learning processes underlying multiple behavioral and fMRI observables using computational model fits. Across both tasks, a quantitatively consistent learning process explains reaction times, choices, and both expectation- and surprise-related neural activity. The same hippocampal and ventral stream regions engaged in anticipating stimuli during learning are also engaged in proportion to the difficulty of decisions. These results support a role for predictive associations learned by the hippocampal memory system to be recalled during choice formation.
We are always learning regularities in the world around us: where things are, and in what order we might find them. Our knowledge of these contingencies can be relied upon if we later want to use them to make decisions. However, there is little agreement about the neurobiological mechanism by which learned contingencies are deployed for decision making. These are different kinds of decisions than simple habits, in which we take actions that have in the past given us reward. Neural mechanisms of habitual decisions are well-described by computational reinforcement learning approaches, but have not often been applied to ‘model-based’ decisions that depend on learned contingencies. In this article, we apply reinforcement learning to investigate model-based decisions. We tested participants on a serial reaction time task with changing sequential contingencies, and choice probes that depend on these contingencies. Fitting computational models to reaction times, we show that two sets of predictions drive simple response behavior, only one of which is used to make choices. Using fMRI, we observed learning and decision-related activity in hippocampal and ventral cortical areas that is computationally linked to the learned contingencies used to make choices. These results suggest a critical role for a hippocampal-cortical network in model-based decisions for reward.
Advances in biotechnology have raised expectations that biomarkers, including genetic profiles, will yield information to accurately predict outcomes for individuals. However, results to date have been disappointing. In addition, statistical methods to quantify the predictive information in markers have not been standardized.
We discuss statistical techniques to summarize predictive information including risk distribution curves and measures derived from them that relate to decision making. Attributes of these measures are contrasted with alternatives such as receiver operating characteristic curves, R-squared, percent reclassification and net reclassification index. Data are generated from simple models of risk conferred by genetic profiles for individuals in a population. Statistical techniques are illustrated and the risk prediction capacities of different risk models are quantified.
Risk distribution curves are most informative and relevant to clinical practice. They show proportions of subjects classified into clinically relevant risk categories. In a population in which 10% have the outcome event and subjects are categorized as high risk if their risk exceeds 20%, we found to identify as high risk more than half of those destined to have an event, either 150 genes each with odds ratio of 1.5 or 250 genes each with odds ratio of 1.25 was required when the minor allele frequencies are 10%. We show that conclusions based on ROC curves may not be the same as conclusions based on risk distribution curves.
Many highly predictive genes will be required in order to identify substantial numbers of subjects at high risk.
biomarkers; classification; discrimination; prediction; statistical methods
As more diagnostic testing options become available to physicians, it becomes more difficult to combine various types of medical information together in order to optimize the overall diagnosis. To improve diagnostic performance, here we introduce an approach to optimize a decision-fusion technique to combine heterogeneous information, such as from different modalities, feature categories, or institutions. For classifier comparison we used two performance metrics: The receiving operator characteristic (ROC) area under the curve [area under the ROC curve (AUC)] and the normalized partial area under the curve (pAUC). This study used four classifiers: Linear discriminant analysis (LDA), artificial neural network (ANN), and two variants of our decision-fusion technique, AUC-optimized (DF-A) and pAUC-optimized (DF-P) decision fusion. We applied each of these classifiers with 100-fold cross-validation to two heterogeneous breast cancer data sets: One of mass lesion features and a much more challenging one of microcalcification lesion features. For the calcification data set, DF-A outperformed the other classifiers in terms of AUC (p<0.02) and achieved AUC=0.85±0.01. The DF-P surpassed the other classifiers in terms of pAUC (p<0.01) and reached pAUC=0.38±0.02. For the mass data set, DF-A outperformed both the ANN and the LDA (p<0.04) and achieved AUC=0.94±0.01. Although for this data set there were no statistically significant differences among the classifiers' pAUC values (pAUC=0.57±0.07 to 0.67±0.05, p>0.10), the DF-P did significantly improve specificity versus the LDA at both 98% and 100% sensitivity (p<0.04). In conclusion, decision fusion directly optimized clinically significant performance measures, such as AUC and pAUC, and sometimes outperformed two wellknown machine-learning techniques when applied to two different breast cancer data sets.
decision fusion; heterogeneous data; receiver operating characteristic (ROC) curve; area under the curve (AUC); partial area under the curve (pAUC); classification; machine learning; breast cancer