Decision curve analysis (DCA) has been proposed as an alternative method for evaluation of diagnostic tests, prediction models, and molecular markers. However, DCA is based on expected utility theory, which has been routinely violated by decision makers. Decision-making is governed by intuition (system 1), and analytical, deliberative process (system 2), thus, rational decision-making should reflect both formal principles of rationality and intuition about good decisions. We use the cognitive emotion of regret to serve as a link between systems 1 and 2 and to reformulate DCA.
First, we analysed a classic decision tree describing three decision alternatives: treat, do not treat, and treat or no treat based on a predictive model. We then computed the expected regret for each of these alternatives as the difference between the utility of the action taken and the utility of the action that, in retrospect, should have been taken. For any pair of strategies, we measure the difference in net expected regret. Finally, we employ the concept of acceptable regret to identify the circumstances under which a potentially wrong strategy is tolerable to a decision-maker.
We developed a novel dual visual analog scale to describe the relationship between regret associated with "omissions" (e.g. failure to treat) vs. "commissions" (e.g. treating unnecessary) and decision maker's preferences as expressed in terms of threshold probability. We then proved that the Net Expected Regret Difference, first presented in this paper, is equivalent to net benefits as described in the original DCA. Based on the concept of acceptable regret we identified the circumstances under which a decision maker tolerates a potentially wrong decision and expressed it in terms of probability of disease.
We present a novel method for eliciting decision maker's preferences and an alternative derivation of DCA based on regret theory. Our approach may be intuitively more appealing to a decision-maker, particularly in those clinical situations when the best management option is the one associated with the least amount of regret (e.g. diagnosis and treatment of advanced cancer, etc).
Diagnostic and prognostic models are typically evaluated with measures of accuracy that do not address clinical consequences. Decision-analytic techniques allow assessment of clinical outcomes, but often require collection of additional information, and may be cumbersome to apply to models that yield a continuous result. We sought a method for evaluating and comparing prediction models that incorporates clinical consequences, requires only the dataset on which the models are tested, and can be applied to models that have either continuous or dichotomous results.
We describe decision curve analysis, a simple, novel method of evaluating predictive models. We start by assuming that the threshold probability of a disease or event at which a patient would opt for treatment is informative of how the patient weighs the relative harms of a false-positive and a false-negative prediction. This theoretical relationship is then used to derive the net benefit of the model across different threshold probabilities. Plotting net benefit against threshold probability yields the “decision curve”. We apply the method to models for the prediction of seminal vesicle invasion in prostate cancer patients. Decision curve analysis identified the range of threshold probabilities in which a model was of value, the magnitude of benefit, and which of several models was optimal.
Decision curve analysis is a suitable method for evaluating alternative diagnostic and prognostic strategies that has advantages over other commonly used measures and techniques.
prediction models; multivariate analysis; decision analysis
The traditional statistical approach to the evaluation of diagnostic tests, prediction models and molecular markers is to assess their accuracy, using metrics such as sensitivity, specificity and the receiver-operating-characteristic curve. However, there is no obvious association between accuracy and clinical value: it is unclear, for example, just how accurate a test needs to be in order for it to be considered "accurate enough" to warrant its use in patient care. Decision analysis aims to assess the clinical value of a test by assigning weights to each possible consequence. These methods have been historically considered unattractive to the practicing biostatistician because additional data from the literature, or subjective assessments from individual patients or clinicians, are needed in order to assign weights appropriately. Decision analytic methods are available that can reduce these additional requirements. These methods can provide insight into the consequences of using a test, model or marker in clinical practice.
The predictiveness curve is a graphical tool that characterizes the population distribution of Risk(Y) = P(D = 1|Y), where D denotes a binary outcome such as occurrence of an event within a specified time period and Y denotes predictors. A wider distribution of Risk(Y) indicates better performance of a risk model in the sense that making treatment recommendations is easier for more subjects. Decisions are more straightforward when a subject's risk is deemed to be high or low. Methods have been developed to estimate predictiveness curves from cohort studies. However early phase studies to evaluate novel risk prediction markers typically employ case-control designs. Here we present semiparametric and nonparametric methods for evaluating a continuous risk prediction marker that accommodate case-control data. Small sample properties are investigated through simulation studies. The semiparametric methods are substantially more efficient than their nonparametric counterparts under a correctly specified model. We generalize them to settings where multiple prediction markers are involved. Applications to prostate cancer risk prediction markers illustrate methods for comparing the risk prediction capacities of markers and for evaluating the increment in performance gained by adding a marker to a baseline risk model. We propose a modified Hosmer-Lemeshow test for case-control study data to assess calibration of the risk model that is a natural complement to this graphical tool.
biomarker; case-control study; classification; Hosmer-Lemeshow test; predictiveness curve; risk; ROC curve
Syndromic surveillance systems can potentially be used to detect a bioterrorist attack earlier than traditional surveillance, by virtue of their near real-time analysis of relevant data. Receiver operator characteristic (ROC) curve analysis using the area under the curve (AUC) as a comparison metric has been recommended as a practical evaluation tool for syndromic surveillance systems, yet traditional ROC curves do not account for timeliness of detection or subsequent time-dependent health outcomes.
Using a decision-analytic approach, we predicted outcomes, measured in lives, quality adjusted life years (QALYs), and costs, for a series of simulated bioterrorist attacks. We then evaluated seven detection algorithms applied to syndromic surveillance data using outcomes-weighted ROC curves compared to simple ROC curves and timeliness-weighted ROC curves. We performed sensitivity analyses by varying the model inputs between best and worst case scenarios and by applying different methods of AUC calculation.
The decision analytic model results indicate that if a surveillance system was successful in detecting an attack, and measures were immediately taken to deliver treatment to the population, the lives, QALYs and dollars lost could be reduced considerably. The ROC curve analysis shows that the incorporation of outcomes into the evaluation metric has an important effect on the apparent performance of the surveillance systems. The relative order of performance is also heavily dependent on the choice of AUC calculation method.
This study demonstrates the importance of accounting for mortality, morbidity and costs in the evaluation of syndromic surveillance systems. Incorporating these outcomes into the ROC curve analysis allows for more accurate identification of the optimal method for signaling a possible bioterrorist attack. In addition, the parameters used to construct an ROC curve should be given careful consideration.
Risk prediction models based on medical history or results of tests are increasingly common in the cancer literature. An important use of these models is to make treatment decisions on the basis of estimated risk. The relative utility curve is a simple method for evaluating risk prediction in a medical decision-making framework. Relative utility curves have three attractive features for the evaluation of risk prediction models. First, they put risk prediction into perspective because relative utility is the fraction of the expected utility of perfect prediction obtained by the risk prediction model at the optimal cut point. Second, they do not require precise specification of harms and benefits because relative utility is plotted against a summary measure of harms and benefits (ie, the risk threshold). Third, they are easy to compute from standard tables of data found in many articles on risk prediction. An important use of relative utility curves is to evaluate the addition of a risk factor to the risk prediction model. To illustrate an application of relative utility curves, an analysis was performed on previously published data involving the addition of breast density to a risk prediction model for invasive breast cancer.
Decision-making in healthcare is complex. Research on coverage decision-making has focused on comparative studies for several countries, statistical analyses for single decision-makers, the decision outcome and appraisal criteria. Accounting for decision processes extends the complexity, as they are multidimensional and process elements need to be regarded as latent constructs (composites) that are not observed directly. The objective of this study was to present a practical application of partial least square path modelling (PLS-PM) to evaluate how it offers a method for empirical analysis of decision-making in healthcare.
Empirical approaches that applied PLS-PM to decision-making in healthcare were identified through a systematic literature search. PLS-PM was used as an estimation technique for a structural equation model that specified hypotheses between the components of decision processes and the reasonableness of decision-making in terms of medical, economic and other ethical criteria. The model was estimated for a sample of 55 coverage decisions on the extension of newborn screening programmes in Europe. Results were evaluated by standard reliability and validity measures for PLS-PM.
After modification by dropping two indicators that showed poor measures in the measurement models’ quality assessment and were not meaningful for newborn screening, the structural equation model estimation produced plausible results. The presence of three influences was supported: the links between both stakeholder participation or transparency and the reasonableness of decision-making; and the effect of transparency on the degree of scientific rigour of assessment. Reliable and valid measurement models were obtained to describe the composites of ‘transparency’, ‘participation’, ‘scientific rigour’ and ‘reasonableness’.
The structural equation model was among the first applications of PLS-PM to coverage decision-making. It allowed testing of hypotheses in situations where there are links between several non-observable constructs. PLS-PM was compatible in accounting for the complexity of coverage decisions to obtain a more realistic perspective for empirical analysis. The model specification can be used for hypothesis testing by using larger sample sizes and for data in the full domain of health technologies.
PLS; Structural equation modelling; Quantitative research; Feasibility study; Model evaluation; Non-parametric; Fourth hurdle; Reimbursement; Neonatal; Europe
Proper evaluation of new diagnostic tests is required to reduce overutilization and to limit potential negative health effects and costs related to testing. A decision analytic modelling approach may be worthwhile when a diagnostic randomized controlled trial is not feasible. We demonstrate this by assessing the cost-effectiveness of modified transesophageal echocardiography (TEE) compared with manual palpation for the detection of atherosclerosis in the ascending aorta.
Based on a previous diagnostic accuracy study, actual Dutch reimbursement data, and evidence from literature we developed a Markov decision analytic model. Cost-effectiveness of modified TEE was assessed for a life time horizon and a health care perspective. Prevalence rates of atherosclerosis were age-dependent and low as well as high rates were applied. Probabilistic sensitivity analysis was applied.
The model synthesized all available evidence on the risk of stroke in cardiac surgery patients. The modified TEE strategy consistently resulted in more adapted surgical procedures and, hence, a lower risk of stroke and a slightly higher number of life-years. With 10% prevalence of atherosclerosis the incremental cost-effectiveness ratio was €4,651 and €481 per quality-adjusted life year in 55-year-old men and women, respectively. In all patients aged 65 years or older the modified TEE strategy was cost saving and resulted in additional health benefits.
Decision analytic modelling to assess the cost-effectiveness of a new diagnostic test based on characteristics, costs and effects of the test itself and of the subsequent treatment options is both feasible and valuable. Our case study on modified TEE suggests that it may reduce the risk of stroke in cardiac surgery patients older than 55 years at acceptable cost-effectiveness levels.
Diagnostic test; Patient outcomes; Cost-effectiveness analysis; Stroke; Cardiac surgery
A major biomedical goal associated with evaluating a candidate biomarker or developing a predictive model score for event-time outcomes is to accurately distinguish between incident cases from the controls surviving beyond t throughout the entire study period. Extensions of standard binary classification measures like time-dependent sensitivity, specificity, and receiver operating characteristic (ROC) curves have been developed in this context (Heagerty, P. J., and others, 2000. Time-dependent ROC curves for censored survival data and a diagnostic marker. Biometrics
56, 337–344). We propose a direct, non-parametric method to estimate the time-dependent Area under the curve (AUC) which we refer to as the weighted mean rank (WMR) estimator. The proposed estimator performs well relative to the semi-parametric AUC curve estimator of Heagerty and Zheng (2005. Survival model predictive accuracy and ROC curves. Biometrics
61, 92–105). We establish the asymptotic properties of the proposed estimator and show that the accuracy of markers can be compared very simply using the difference in the WMR statistics. Estimators of pointwise standard errors are provided.
AUC curve; Survival analysis; Time-dependent ROC
The performance of prediction models can be assessed using a variety of different methods and metrics. Traditional measures for binary and survival outcomes include the Brier score to indicate overall model performance, the concordance (or c) statistic for discriminative ability (or area under the receiver operating characteristic (ROC) curve), and goodness-of-fit statistics for calibration.
Several new measures have recently been proposed that can be seen as refinements of discrimination measures, including variants of the c statistic for survival, reclassification tables, net reclassification improvement (NRI), and integrated discrimination improvement (IDI). Moreover, decision–analytic measures have been proposed, including decision curves to plot the net benefit achieved by making decisions based on model predictions.
We aimed to define the role of these relatively novel approaches in the evaluation of the performance of prediction models. For illustration we present a case study of predicting the presence of residual tumor versus benign tissue in patients with testicular cancer (n=544 for model development, n=273 for external validation).
We suggest that reporting discrimination and calibration will always be important for a prediction model. Decision-analytic measures should be reported if the predictive model is to be used for making clinical decisions. Other measures of performance may be warranted in specific applications, such as reclassification metrics to gain insight into the value of adding a novel predictor to an established model.
Decision curve analysis has been introduced as a method to evaluate prediction models in terms of their clinical consequences if used for a binary classification of subjects into a group who should and into a group who should not be treated. The key concept for this type of evaluation is the "net benefit", a concept borrowed from utility theory.
We recall the foundations of decision curve analysis and discuss some new aspects. First, we stress the formal distinction between the net benefit for the treated and for the untreated and define the concept of the "overall net benefit". Next, we revisit the important distinction between the concept of accuracy, as typically assessed using the Youden index and a receiver operating characteristic (ROC) analysis, and the concept of utility of a prediction model, as assessed using decision curve analysis. Finally, we provide an explicit implementation of decision curve analysis to be applied in the context of case-control studies.
We show that the overall net benefit, which combines the net benefit for the treated and the untreated, is a natural alternative to the benefit achieved by a model, being invariant with respect to the coding of the outcome, and conveying a more comprehensive picture of the situation. Further, within the framework of decision curve analysis, we illustrate the important difference between the accuracy and the utility of a model, demonstrating how poor an accurate model may be in terms of its net benefit. Eventually, we expose that the application of decision curve analysis to case-control studies, where an accurate estimate of the true prevalence of a disease cannot be obtained from the data, is achieved with a few modifications to the original calculation procedure.
We present several interrelated extensions to decision curve analysis that will both facilitate its interpretation and broaden its potential area of application.
To avoid complications associated with under- or overtreatment of patients with skeletal metastases, doctors need accurate survival estimates. Unfortunately, prognostic models for patients with skeletal metastases of the extremities are lacking, and physician-based estimates are generally inaccurate.
We developed three types of prognostic models and compared them using calibration plots, receiver operating characteristic (ROC) curves, and decision curve analysis to determine which one is best suited for clinical use.
A training set consisted of 189 patients who underwent surgery for skeletal metastases. We created models designed to predict 3- and 12-month survival using three methods: an Artificial Neural Network (ANN), a Bayesian Belief Network (BBN), and logistic regression. We then performed crossvalidation and compared the models in three ways: calibration plots plotting predicted against actual risk; area under the ROC curve (AUC) to discriminate the probability that a patient who died has a higher predicted probability of death compared to a patient who did not die; and decision curve analysis to quantify the clinical consequences of over- or undertreatment.
All models appeared to be well calibrated, with the exception of the BBN, which underestimated 3-month survival at lower probability estimates. The ANN models had the highest discrimination, with an AUC of 0.89 and 0.93, respectively, for the 3- and 12-month models. Decision analysis revealed all models could be used clinically, but the ANN models consistently resulted in the highest net benefit, outperforming the BBN and logistic regression models.
Our observations suggest use of the ANN model to aid decisions about surgery would lead to better patient outcomes than other alternative approaches to decision making.
Level of Evidence
Level II, prognostic study. See Instructions for Authors for a complete description of levels of evidence.
Clinicians face an increasing volume of biomedical data. Assessing the efficacy of systems that enable accurate and timely clinical decision making merits corresponding attention. This paper discusses the multiple-reader multiple-case (MRMC) experimental design and linear mixed models as means of assessing and comparing decision accuracy and latency (time) for decision tasks in which clinician readers must interpret visual displays of data. These tools can assess and compare decision accuracy and latency (time). These experimental and statistical techniques, used extensively in radiology imaging studies, offer a number of practical and analytic advantages over more traditional quantitative methods such as percent-correct measurements and ANOVAs, and are recommended for their statistical efficiency and generalizability. An example analysis using readily available, free, and commercial statistical software is provided as an appendix. While these techniques are not appropriate for all evaluation questions, they can provide a valuable addition to the evaluative toolkit of medical informatics research.
data visualization; data display; clinical decision support systems; statistical data analysis; ROC curve; MRMC analysis; mixed models; evaluation
Although a fully general extension of ROC analysis to classification tasks with more than two classes has yet to be developed, the potential benefits to be gained from a practical performance evaluation methodology for classification tasks with three classes have motivated a number of research groups to propose methods based on constrained or simplified observer or data models. Here we consider an ideal observer in a task with underlying data drawn from three univariate normal distributions. We investigate the behavior of the resulting ideal observer’s decision variables and ROC surface. In particular, we show that the pair of ideal observer decision variables is constrained to a parametric curve in two-dimensional likelihood ratio space, and that the decision boundary line segments used by the ideal observer can intersect this curve in at most six places. From this, we further show that the resulting ROC surface has at most four degrees of freedom at any point, and not the five that would be required, in general, for a surface in a six-dimensional space to be non-degenerate. In light of the difficulties we have previously pointed out in generalizing the well-known area under the ROC curve performance metric to tasks with three or more classes, the problem of developing a suitable and fully general performance metric for classification tasks with three or more classes remains unsolved.
ROC analysis; three-class classification; ideal observer decision rules
Predictive models are often constructed from clinical databases with the
goal of eventually helping make better clinical decisions. Evaluating
models using decision theory is therefore natural. When constructing
a model using statistical and machine learning methods, however, we are
often uncertain about precisely how a model will be used. Thus, decision-independent
measures of classification performance, such as the
area under an ROC curve, are popular. As a complementary method of evaluation, we
investigate techniques for deriving the expected utility of
a model under uncertainty about the model's utilities. We demonstrate
an example of the application of this approach to the evaluation
of two models that diagnose coronary artery disease.
How do we use our memories of the past to guide decisions we've never had to make before? Although extensive work describes how the brain learns to repeat rewarded actions, decisions can also be influenced by associations between stimuli or events not directly involving reward — such as when planning routes using a cognitive map or chess moves using predicted countermoves — and these sorts of associations are critical when deciding among novel options. This process is known as model-based decision making. While the learning of environmental relations that might support model-based decisions is well studied, and separately this sort of information has been inferred to impact decisions, there is little evidence concerning the full cycle by which such associations are acquired and drive choices. Of particular interest is whether decisions are directly supported by the same mnemonic systems characterized for relational learning more generally, or instead rely on other, specialized representations. Here, building on our previous work, which isolated dual representations underlying sequential predictive learning, we directly demonstrate that one such representation, encoded by the hippocampal memory system and adjacent cortical structures, supports goal-directed decisions. Using interleaved learning and decision tasks, we monitor predictive learning directly and also trace its influence on decisions for reward. We quantitatively compare the learning processes underlying multiple behavioral and fMRI observables using computational model fits. Across both tasks, a quantitatively consistent learning process explains reaction times, choices, and both expectation- and surprise-related neural activity. The same hippocampal and ventral stream regions engaged in anticipating stimuli during learning are also engaged in proportion to the difficulty of decisions. These results support a role for predictive associations learned by the hippocampal memory system to be recalled during choice formation.
We are always learning regularities in the world around us: where things are, and in what order we might find them. Our knowledge of these contingencies can be relied upon if we later want to use them to make decisions. However, there is little agreement about the neurobiological mechanism by which learned contingencies are deployed for decision making. These are different kinds of decisions than simple habits, in which we take actions that have in the past given us reward. Neural mechanisms of habitual decisions are well-described by computational reinforcement learning approaches, but have not often been applied to ‘model-based’ decisions that depend on learned contingencies. In this article, we apply reinforcement learning to investigate model-based decisions. We tested participants on a serial reaction time task with changing sequential contingencies, and choice probes that depend on these contingencies. Fitting computational models to reaction times, we show that two sets of predictions drive simple response behavior, only one of which is used to make choices. Using fMRI, we observed learning and decision-related activity in hippocampal and ventral cortical areas that is computationally linked to the learned contingencies used to make choices. These results suggest a critical role for a hippocampal-cortical network in model-based decisions for reward.
Advances in biotechnology have raised expectations that biomarkers, including genetic profiles, will yield information to accurately predict outcomes for individuals. However, results to date have been disappointing. In addition, statistical methods to quantify the predictive information in markers have not been standardized.
We discuss statistical techniques to summarize predictive information including risk distribution curves and measures derived from them that relate to decision making. Attributes of these measures are contrasted with alternatives such as receiver operating characteristic curves, R-squared, percent reclassification and net reclassification index. Data are generated from simple models of risk conferred by genetic profiles for individuals in a population. Statistical techniques are illustrated and the risk prediction capacities of different risk models are quantified.
Risk distribution curves are most informative and relevant to clinical practice. They show proportions of subjects classified into clinically relevant risk categories. In a population in which 10% have the outcome event and subjects are categorized as high risk if their risk exceeds 20%, we found to identify as high risk more than half of those destined to have an event, either 150 genes each with odds ratio of 1.5 or 250 genes each with odds ratio of 1.25 was required when the minor allele frequencies are 10%. We show that conclusions based on ROC curves may not be the same as conclusions based on risk distribution curves.
Many highly predictive genes will be required in order to identify substantial numbers of subjects at high risk.
biomarkers; classification; discrimination; prediction; statistical methods
We provide a tutorial on the construction and evaluation of Markov decision processes (MDPs), which are powerful analytical tools used for sequential decision making under uncertainty that have been widely used in many industrial and manufacturing applications but are underutilized in medical decision making (MDM). We demonstrate the use of an MDP to solve a sequential clinical treatment problem under uncertainty. Markov decision processes generalize standard Markov models in that a decision process is embedded in the model and multiple decisions are made over time. Furthermore, they have significant advantages over standard decision analysis. We compare MDPs to standard Markov-based simulation models by solving the problem of the optimal timing of living-donor liver transplantation using both methods. Both models result in the same optimal transplantation policy and the same total life expectancies for the same patient and living donor. The computation time for solving the MDP model is significantly smaller than that for solving the Markov model. We briefly describe the growing literature of MDPs applied to medical decisions.
Markov decision processes; decision analysis; Markov processes
Class prediction using “omics” data is playing an increasing role in toxicogenomics, diagnosis/prognosis, and risk assessment. These data are usually noisy and represented by relatively few samples and a very large number of predictor variables (e.g., genes of DNA microarray data or m/z peaks of mass spectrometry data). These characteristics manifest the importance of assessing potential random correlation and overfitting of noise for a classification model based on omics data. We present a novel classification method, decision forest (DF), for class prediction using omics data. DF combines the results of multiple heterogeneous but comparable decision tree (DT) models to produce a consensus prediction. The method is less prone to overfitting of noise and chance correlation. A DF model was developed to predict presence of prostate cancer using a proteomic data set generated from surface-enhanced laser deposition/ionization time-of-flight mass spectrometry (SELDI-TOF MS). The degree of chance correlation and prediction confidence of the model was rigorously assessed by extensive cross-validation and randomization testing. Comparison of model prediction with imposed random correlation demonstrated biologic relevance of the model and the reduction of overfitting in DF. Furthermore, two confidence levels (high and low confidences) were assigned to each prediction, where most misclassifications were associated with the low-confidence region. For the high-confidence prediction, the model achieved 99.2% sensitivity and 98.2% specificity. The model also identified a list of significant peaks that could be useful for biomarker identification. DF should be equally applicable to other omics data such as gene expression data or metabolomic data. The DF algorithm is available upon request.
bioinformatics; chance correlation; class prediction; classification; decision forest; prediction confidence; prostate cancer; proteomics; SELDI-TOF
The estimate of a multivariate risk is now required in guidelines for cardiovascular prevention. Limitations of existing statistical risk models lead to explore machine-learning methods. This study evaluates the implementation and performance of a decision tree (CART) and a multilayer perceptron (MLP) to predict cardiovascular risk from real data. The study population was randomly splitted in a learning set (n = 10,296) and a test set (n = 5,148). CART and the MLP were implemented at their best performance on the learning set and applied on the test set and compared to a logistic model. Implementation, explicative and discriminative performance criteria are considered, based on ROC analysis. Areas under ROC curves and their 95% confidence interval are 0.78 (0.75-0.81), 0.78 (0.75-0.80) and 0.76 (0.73-0.79) respectively for logistic regression, MLP and CART. Given their implementation and explicative characteristics, these methods can complement existing statistical models and contribute to the interpretation of risk.
Computerized clinical decision support systems are information technology-based systems designed to improve clinical decision-making. As with any healthcare intervention with claims to improve process of care or patient outcomes, decision support systems should be rigorously evaluated before widespread dissemination into clinical practice. Engaging healthcare providers and managers in the review process may facilitate knowledge translation and uptake. The objective of this research was to form a partnership of healthcare providers, managers, and researchers to review randomized controlled trials assessing the effects of computerized decision support for six clinical application areas: primary preventive care, therapeutic drug monitoring and dosing, drug prescribing, chronic disease management, diagnostic test ordering and interpretation, and acute care management; and to identify study characteristics that predict benefit.
The review was undertaken by the Health Information Research Unit, McMaster University, in partnership with Hamilton Health Sciences, the Hamilton, Niagara, Haldimand, and Brant Local Health Integration Network, and pertinent healthcare service teams. Following agreement on information needs and interests with decision-makers, our earlier systematic review was updated by searching Medline, EMBASE, EBM Review databases, and Inspec, and reviewing reference lists through 6 January 2010. Data extraction items were expanded according to input from decision-makers. Authors of primary studies were contacted to confirm data and to provide additional information. Eligible trials were organized according to clinical area of application. We included randomized controlled trials that evaluated the effect on practitioner performance or patient outcomes of patient care provided with a computerized clinical decision support system compared with patient care without such a system.
Data will be summarized using descriptive summary measures, including proportions for categorical variables and means for continuous variables. Univariable and multivariable logistic regression models will be used to investigate associations between outcomes of interest and study specific covariates. When reporting results from individual studies, we will cite the measures of association and p-values reported in the studies. If appropriate for groups of studies with similar features, we will conduct meta-analyses.
A decision-maker-researcher partnership provides a model for systematic reviews that may foster knowledge translation and uptake.
Gene-expression signature-based disease classification and clinical outcome prediction has not been widely introduced in clinical medicine as initially expected, mainly due to the lack of extensive validation needed for its clinical deployment. Obstacles include variable measurement in microarray assay, inconsistent assay platform, analytical requirement for comparable pair of training and test datasets, etc. Furthermore, as medical device helping clinical decision making, the prediction needs to be made for each single patient with a measure of its reliability. To address these issues, there is a need for flexible prediction method less sensitive to difference in experimental and analytical conditions, applicable to each single patient, and providing measure of prediction confidence. The nearest template prediction (NTP) method provides a convenient way to make class prediction with assessment of prediction confidence computed in each single patient's gene-expression data using only a list of signature genes and a test dataset. We demonstrate that the method can be flexibly applied to cross-platform, cross-species, and multiclass predictions without any optimization of analysis parameters.
To describe how investigators in a multisite randomized clinical trial addressed scientific and ethical issues involved in creating risk models based on genetic testing for African American participants.
The following informed our decision whether to stratify risk assessment by ethnicity: evaluation of epidemiological data, appraisal of benefits and risks of incorporating ethnicity into calculations, and feasibility of creating ethnicity-specific risk curves. Once the decision was made, risk curves were created based on data from a large, diverse study of first-degree relatives of patients with Alzheimer disease.
Review of epidemiological data suggested notable differences in risk between African Americans and whites and that Apolipoprotein E genotype predicts risk in both groups. Discussions about the benefits and risks of stratified risk assessments reached consensus that estimates based on data from whites should not preclude enrolling African Americans, but population-specific risk curves should be created if feasible. Risk models specific to ethnicity, gender, and Apolipoprotein E genotype were subsequently developed for the randomized clinical trial that oversampled African Americans.
The Risk Evaluation and Education for Alzheimer Disease study provides an instructive example of a process to develop risk assessment protocols that are sensitive to the implications of genetic testing for multiple ethnic groups with differing levels of risk.
Alzheimer; ethnicity; genetics; risk; APOE
Classification methods are widely used for identifying underlying groupings within datasets and predicting the class for new data objects given a trained classifier. This study introduces a project aimed at using a combination of simulations and classification techniques to predict epidemic curves and infer underlying disease parameters for an ongoing outbreak.
Six supervised classification methods (random forest, support vector machines, nearest neighbor with three decision rules, linear and flexible discriminant analysis) were used in identifying partial epidemic curves from six agent-based stochastic simulations of influenza epidemics. The accuracy of the methods was compared using a performance metric based on the McNemar test.
The findings showed that: (1) assumptions made by the methods regarding the structure of an epidemic curve influences their performance i.e. methods with fewer assumptions perform best, (2) the performance of most methods is consistent across different individual-based networks for Seattle, Los Angeles and New York and (3) combining classifiers using a weighting approach does not guarantee better prediction.
epidemic curves; supervised learning; agent-based epidemic models; classification; random forest
We present a novel machine learning approach for the classification of cancer samples using expression data. We refer to the method as “decision trunks,” since it is loosely based on decision trees, but contains several modifications designed to achieve an algorithm that: (1) produces smaller and more easily interpretable classifiers than decision trees; (2) is more robust in varying application scenarios; and (3) achieves higher classification accuracy. The decision trunk algorithm has been implemented and tested on 26 classification tasks, covering a wide range of cancer forms, experimental methods, and classification scenarios. This comprehensive evaluation indicates that the proposed algorithm performs at least as well as the current state of the art algorithms in terms of accuracy, while producing classifiers that include on average only 2–3 markers. We suggest that the resulting decision trunks have clear advantages over other classifiers due to their transparency, interpretability, and their correspondence with human decision-making and clinical testing practices.
classification; machine learning; gene expression; biomarkers