|Home | About | Journals | Submit | Contact Us | Français|
Efforts to describe how individual treatment decisions are informed by systematic knowledge have been hindered by a standard that gauges the quality of clinical decisions by their adherence to guidelines and evidence-based practices. This paper tests a new contextual standard that gauges the incorporation of knowledge into practice and develops a model of evidence-based decision making.
Previous work found that the forecasted outcome of a treatment guideline exerts a highly significant influence on how it is used in making decisions. This study proposed that forecasted outcomes affect the recognition of a treatment scenario, and this recognition triggers distinct contextual decision strategies.
N=21 volunteers from a psychiatric residency program responded to 64 case vignettes, 16 in each of four treatment scenarios. The vignettes represented a fully balanced within-subjects design that included guideline switching criteria and patient-specific factors. For each vignette, participants indicated whether they endorsed the guideline’s recommendation.
Clinicians employed consistent contextual decision strategies in responding to clearly positive or negative forecasts. When forecasts were more ambiguous or risky, their strategies became complex and relatively inconsistent.
The results support a three step model of evidence-based decision making, in which clinicians recognize a decision scenario, apply a simple contextual strategy, then if necessary engage a more complex strategy to resolve discrepancies between general guidelines and specific cases. The paper concludes by noting study limitations and discussing implications of the model for future research in clinical and shared decision making, training, and guideline development.
It is generally acknowledged that clinician decision making is essential to effective healthcare service delivery and plays a significant role in efforts to improve quality of care (1, 2). However, there is considerable disagreement regarding the quality of clinical judgment, in particular the capacity of clinicians to conform to treatment guidelines and evidence-based practices. Some investigators believe that biased judgment so compromises service delivery that transformative initiatives should attempt to minimize clinical discretion (3–6). In response, it has been claimed that administrative and organizational remedies misconstrue the purpose of clinical decision making and prove ineffective (7–9). The sundry perspectives are products of a long-standing controversy that will not be easily reconciled(10, 11).
This paper is an effort to address the disparity by advancing Eddy’s vision of evidence-based decision making (12)and its call for the individualized application of systematic knowledge to clinical practice. The paper begins by proposing that adherence, whether to guidelines or other representations of systematic knowledge, is a problematic standard; achieving Eddy’s vision requires a new standard that gauges the systemic incorporation of knowledge into practice. The paper presents and tests the new incorporation standard, describes how it informs a model of evidence-based decision making, and discusses implications of the model for future research.
In the services literature, particularly the emerging discipline known as implementation science, clinical decisions tend to be viewed as products of administrative policy making (13–16). Consistent with this vision, clinical decision making is a transductive practice that functions effectively when it conforms to the standards of care, norms of practice, and clinical guidelines (17–24). A wealth of studies has demonstrated lack of conformance to clinical guidelines and dosage recommendations (25–29); in turn, these tendencies have been attributed to factors such as lack of knowledge, vulnerability to the persuasive appeals of big pharma, intransigence, poor training, lack of proper monitoring, and inherent limitations of human cognitive functioning(30–33).
Conclusions about flawed clinical judgment are based almost invariably on data that report adherence to treatment guidelines or other systematic representations. Critics would point to three problems with this procedure: First, guideline developers such as the American Psychiatric Association insist that their products be construed as aids to decision making and not as standards of care. Guidelines give aggregate recommendations, whereas actual treatment decisions combine systematic knowledge with “the clinical data presented by the patient and the diagnostic and treatment options available” (34, p. 1, also see 35, 36). Second, equating failure of endorsement with bias is a circular exercise, unless the evaluation criteria can be supported by independent evidence. Lacking this support, it is impossible to distinguish sub-optimal judgments from the application of flawed or inappropriate quality of care criteria(37, 38).
Finally, critics have proposed that the adherence standard misconstrues the fundamental task of clinical judgment, which is to use systematic knowledge to meet the needs of individual patients. According to Weiner(39), the cardinal question for clinical decision making is: “What is the best next thing for this patient at this time?” (p. 281). Others have made similar observations about decision making as the application of general knowledge to specific cases (7, 40), of the need to humanize the evidence (11, 41), of striking a balance among competing needs, and taking into account idiosyncratic characteristics of patients(42, 43). Weiner goes even further, perhaps, by asserting that by misconstruing guidelines as standards to be followed rather than knowledge to be incorporated produces a pattern of sub-optimal judgment called “context error” (also see 44, 45).
Notwithstanding these arguments, the adherence standard likely will remain predominant among services researchers, administrators, and policy makers for two reasons: Its endorsement test provides an efficient means of assessing the impact of transformational and other quality of care initiatives. Second, and no less important, a contextualized alternative to the adherence standard has yet to be developed (8). Our current research program is addressing the latter limitation by developing an “incorporation” standard and a companion “matching” test, whose basic principle is consistency in the use of clinical guidelines: Regardless of whether a guideline’s recommendation is endorsed in a given case, it should play a significant and consistent role in the decision making process.
Incorporation differs from adherence in three significant ways: First, the new standard enables investigators, educators, and clinicians to distinguish rejecting a guideline altogether from not endorsing its application to specific cases. Second, whereas endorsement rates or frequencies are used to gauge adherence to a guideline, incorporation requires consistency in the application of a decision strategy. Consequently, while clinicians can meet the adherence standard by automatically endorsing a guideline, mere adherence is not sufficient to demonstrate incorporation. Finally, whereas measuring endorsement percentages by clinician, program, or agency serves the interests of services and program evaluation research, the incorporation standard is better suited to educational and implementation activities, which focus primarily on the contextual application of knowledge.
Both the current adherence standard and the proposed incorporation standard are products of basic research in behavioral decision theory. The adherence standard emanates from expected value theory, Bayesian applications, regression modeling, and cognitive heuristics. What these approaches have in common is that actual decisions are compared to logical or mathematical norms (46, 47). The problem with this tack is that a correct answer or optimal response must be specified in advance. In some instances, particularly in clinical decision making, there is no a priori ideal or best alternative to put forward, and the most effective course of action for a population may not be feasible or desirable for a specific case. In such instances, the task of decision making is more akin to creative problem solving than to targeting and adhering to an optimal solution (48–51).
It is the problem-solving thrust of clinical decision making that bears a likeness to the alternative to behavioral decision theory known as naturalistic decision making (52, 53). NDM was inspired initially by Simon’s investigations into the behavior of expert decision makers (54). As the movement coalesced, Simon’s work spurred development of the Recognition Primed Decision Model (55), Image Theory (56), and other efforts that focus on what happens at the outset of a decisional process—where the situation and its incumbent requirements are apprehended and the problem solving effort begins. “Recognition” refers to the collection of activities that occur at the outset of decision making (57). When clinicians recognize a clinical situation, they get an immediate glimpse into what is commonly known as the clinical picture. They plot a trajectory toward a treatment outcome, and develop a means, or strategy, for attaining the outcome.
Image Theory’s depiction of recognition closely resembles Weiner’s contextual account of clinical decision making (58). Knowledge is incorporated by comparing two sets of attributes, a general set and a specific set, and performing a simple, non-compensatory, test of whether the number of mismatched attributes exceeds a likeness threshold(59, 60). In clinical applications of the matching test, general attributes, such as treatment guideline criteria and recommendations, are matched to attributes of specific cases. If there is a likeness, the recommendation is endorsed; if the mismatch threshold is exceeded, the clinician rejects the recommendation. In either case, the performance of the matching test demonstrates whether the guideline has been incorporated at the point of recognition.
Whereas the adherence standard is tested by the frequency of endorsement, the incorporation standard is tested by the consistency with which general knowledge is matched to specific cases. Evidence of matching lies in comparing the endorsement rate and the number of mismatched attributes across a range of clinicians and cases. The role of the matching test is expressed by two hypotheses: First, the endorsement rate should be inversely related to the number of mismatches. Second, the mismatch or “rejection” threshold should be consistent for individual clinicians across a range of similar cases. These hypotheses were tested (61) by using a switching guideline for schizophrenia treatment developed by Sernyak and associates (62). Their Yale Psychiatry-Sernyak Algorithm (YPSA) was disseminated to a group of public mental health clinicians, who indicated that they understood and intended to concur with its recommendations. However, a retrospective chart review showed that it was endorsed in only 1 of the 22 cases for which a treatment switch was recommended. Ironically, this low endorsement rate made the guideline well suited to comparing the two standards by examining whether clinicians incorporate the guideline even when they do not adhere to it.
Using procedures described in the following section, the study obtained an overall endorsement rate of 42%. The first hypothesis was confirmed by a significant inverse relationship between endorsements and mismatches across subjects. The second hypothesis was confirmed by significant within-subjects polynomial and linear contrast effects, with the latter accounting for 65% of the endorsement variance. Some participants exhibited a highly conservative discrepancy threshold of 1, while others had a more generous threshold of 3. The clinicians who participated in Sernyak’s original dissemination study were asked retrospectively why they elected not to endorse the guideline. They emphasized the two patient-related factors—the patient’s adherence to treatment and the forecasted effect of following the guideline’s recommendation. These factors were used as patient-related attributes in the matching study. A separate analysis found that the forecasted outcome factor played a highly significant role, more so than what was revealed by a simple matching test (63). In contrast, adherence to treatment was significant only as a moderating variable.
In the Image Theory literature, the disproportionate influence of expected outcome is explained by the phenomenon of weighting (64), but this explanation assumes that all of the variables that bear on a recognition strategy are examined simultaneously. For instance, in narrowing a pool of job candidates, experience may be weighted higher than education. Lidz and Mulvey offer a somewhat different explanation. Their conditional prediction model proposes that certain factors take precedence and are examined first; they will appear to have a higher weight than factors that are examined in contingent fashion (65). The purpose of the current study is to determine whether clinicians make contextualized treatment decisions by applying a conditional recognition strategy. The guiding question is, do different forecasted outcomes lead to different ways of matching guideline to case? If so, these recognition strategies can lead to a conditional model that is responsive to Eddy’s call for evidence-based decision making.
Study participants were presented with the YPSA guideline and freely consulted it as they completed a stimulus task consisting of 64 case vignettes (with fillers) that were presented in four randomized orders. The vignettes were constructed from a fully balanced set of five factors. As described in Table 1, forecasted outcome, the expected effect of following the guideline’s recommendation, was treated as the antecedent (or primary) variable and has four levels, each representing a distinct though common clinical scenario. As summarized in Table 2 and described below, the four contingent (or secondary) variables had two levels. Sixteen vignettes in each scenariore present all combinations of a balanced 2 × 2 × 2 × 2 within-subjects design. Study participants were asked to indicate whether they endorsed the YPSA recommendation in each vignette. Data were analyzed using generalized estimating equation (GEE) modeling, the generalized linear model of choice for correlated binominal data (66).
The YPSA guideline is a five-step switching algorithm that begins with a first generation antipsychotic (FGA) such as haloperidol or perphenazine. Switches to a different antipsychotic agent at each succeeding step are as follows: a second FGA at step 2; a second generation antipsychotic treatment (SGA), such as olanzapine or risperidone at step 3; a second SGA at step 4, and clozapine therapy at step 5.(Although clozapine is actually an SGA, its special properties warrant a separate category). As noted in Table 2, “guideline step” is one of the four contingent variables. Patients who have proceeded to step 2 have failed only one treatment trial, but at step 4 there have been insufficient responses at three trials, and patients who are at step 4 fall within the usual criteria for treatment resistant schizophrenia (67). Switching recommendations are based on two Clinical Global Impression Scale (CGI) ratings (68) —global improvement and the patient’s current condition. Each of these two scale scores is a contingent variable, with one level established at the switching threshold and the other level considerably below the threshold. While threshold scores call for a switch, their case is tenuous, owing to the inherent risks of changing treatments (particularly at step 4) and the heterogeneous course of the illness, which may not be fully known at step 2. In guidelines such as the TMAP (69), threshold scores are treated as “partial” responses, and clinicians have greater latitude about whether to recommend a switch. In addition, the TMAP relaxes the switch criteria for treatment-resistant patients. The final contingent variable is “patient adherence,” the factor in addition to forecasted progress that was identified most frequently in the original dissemination study as a reason for not endorsing the guideline’s recommendation.
As noted in Table 2, the two levels of each contingent variable represent a match or mismatch. For each scenario, there was 1 vignette at mismatch points 0 and 4, 4 vignettes at mismatch points 1 and 3, and 6 vignettes at mismatch point 2. While the estimated values for the 5-point mismatch model are descriptive, the small sample sizes at the extremes make the population parameters difficult to estimate reliably. To compensate for the sample size problem, are stricted 2 × 2 analysis (with a 3-point mismatch count that ranged from 0 to 2) was also run on each scenario. In the 3-point models, there were 4 vignettes at mismatch points 0 and 2, and 8 vignettes at mismatch point 1. The two secondary variables in the restricted model are adherence and guideline step. The latter was chosen it represents the best overall summary of severity and global improvement, given that patients arrive at step 4 because there have been three failures to achieve a positive response. While the algorithm calls for a second SGA at step 4, physicians may be more inclined to continue the current treatment, particularly if the patient’s response is at the threshold. As an alternative, they may follow preferred practice and jump from step 3 to step 5 by recommending clozapine (70, 71). As Table 1 indicates, step 2 of the guideline is regarded as a match and step 4 is a mismatch.
Participants were 21 residents at one psychiatry training program, who had experience in treating patients with schizophrenia and volunteered to complete a one hour task. They were paid $100.00 for participating. This was convenience sample, as required by the funding source and the two local Human Investigation Committees that reviewed the study, to minimize any concerns that participation could affect their status in the program. The sampling frame was limited to residents, given the evidence that trainees and experienced clinicians invoke different decisional processes (72). Of the 21 participants, 11 were third year, 5 were fourth year, and 5 were fellows. The 14 males and 7 females had a mean age of 33.4 years (sd=3.6). Fourteen of the 21 listed their race as Caucasian, 6 as Asian, and 1 as other. One male Caucasian resident identified himself as Hispanic. These sample characteristics correspond roughly to the population that comprise the training program. Because participant-level factors had no significant effect on endorsement ratings, they were not included in further data analysis.
Analysis began by examining the overall influence of forecasted outcome on endorsements. This effect was significant (Wald χ2 = 187, df=3, p<.001), and Bonferroni-corrected paired comparisons showed significant differences between every scenario except for ineffective versus high risk. These results suggest that applying guideline to case may have involved more than one conditional strategy. Table 3 reports analyzes of the 5-pointand 3-point contingent models for each of the four scenarios. All of the GEE models reported in this paper specified a first-order autoregressive correlation matrix structure and a logit link function, and order of presentation was treated as a within-subject effect.
As Table 3 indicates, overall endorsements ranged from 81% in the high gain scenario to 8% in the low gain scenario. For the high gain scenario, both contingent models are significant at p<.001. The endorsement rate at mismatch point 1 was a bit higher than at point 0 in the 5-point model, but the difference was non-significant. The two rejection thresholds found in previous analyses are apparent in the high gain scenario, with one threshold occurring between points 1 and 2 and the other between points 3 and 4. Pan’s QICC goodness of fittest, with smaller numbers indicating a better fit (73), showed almost identical results for the two contingent models and supported the use of the 3-point model to estimate clinician endorsement rates. The significant paired comparisons and linear contrast in the 3-point model give a relatively clear indication that clinicians invoked a simple conditional matching strategy in the high gain scenario.
For the low gain scenario, the 5-point model did not converge with a logit link function and the estimates are based on an identity link function. Both the 5-and 3-point models were statistically significant, as was the 0–2 paired comparison. In the 3-point model, there was also a significant difference between points 1 and 2, and the linear effect was significant at p<.001. These findings suggest that participants were invoking an inverted conditional matching strategy that featured a conservative rejection threshold. In the course of recognizing a low gain scenario, clinicians targeted a specific group of patients who were non-adherent and well along in treatment. Switching to a second SGA may offer the best prospect for improving adherence for patients in this group, because two FGAs have failed already and non-adherence militates against the introduction of clozapine, which requires close monitoring. The conservative threshold indicates that any contraindicating factor (in this case, any match) leads to rejecting the guideline recommendation. In a low gain scenario, the guideline is likely not to be followed for patients whose condition has deteriorated or who have failed to make progress during the current course of treatment. If these patients are adherent, they are good candidates for clozapine.
The other two scenariosare more difficult to interpret. The endorsement profiles at each mismatch point suggest that a simple (in the high risk scenario) or inverse (in the ineffective scenario) matching strategy may have been invoked, but only provisionally. The pattern of endorsement estimates, combined with the general lack of statistical significance(notwithstanding the overall and linear effects in the ineffective scenario’s 5-point model), suggests that basic matching was either rejected or abandoned in favor of a more complex strategy that was invoked subsequent to recognition. Whether the latter strategies constituted sound adaptive responses to the complexity of ineffective and high risk situations, or whether participants demonstrated the very sub-optimal tendencies that were documented in early studies of clinical decision making (74, 75), is a subject for future investigation.
Sub-optimal decision making has been identified as significant impediment in transforming scientific knowledge into clinical practice(76, 77). In response, we have suggested that this claim relies on an adherence standard and conflates sub-optimal decision making with sound contextual practice. We have proposed that the quest for optimal decisions be supplanted with a problem-solving perspective that focuses on activities at the outset of decision making, and that enables researchers, educators, and policy makers to gauge how systematic knowledge is incorporated into clinical practice. Previous tests found that while clinicians used a matching test to incorporate general and patient-specific attributes into their treatment recommendations, forecasted progress appeared to exert an disproportionate influence. Drawing on Image Theory (56) and the Conditional Prediction Model (78), the current study tested a conditional matching model and found evidence of at least three strategies that were invoked at the outset of clinical decision making. Based on these results, this section proposes a model that responds to Eddy’s (12)call to link systematic knowledge with the needs of individual patients through evidence-based decision making.
The model displayed in Figure 1 depicts a three-step movement that begins with recognizing the decision situation, then applying a matching strategy, and subsequently adapting the strategy as needed. The application of a matching strategy at step 2 follows immediately from identifying the entities and attributes and establishing a trajectory at step 1. For experienced decision makers, especially clinicians who are adept at pattern-matching (79–82), steps 1 and 2 can occur simultaneously. However, there may be specific reasons to keep them separate. Training may be one instance in which apprehending the clinical picture is viewed as distinct from invoking a decision strategy. Administrative policies and practices that require clinicians to document that they have factored specific pieces of information into treatment decisions is another. As evidenced by the ability of clinicians to use the YPSA guideline consistently, a low information threshold is required to move decision making processes forward, at least in simple cases. However, informational requirements may be considerably greater when decision making is shared by clinicians and patients. or when it involves networks of stakeholders and providers.
Decision making moves to the third step if the evidence available to decision makers about treatment alternatives cannot be incorporated at the point of recognition. In the current study, simple recognition-based strategies proved insufficient when the likelihood of significant improvement was remote or when a prospective switch incurred significant risk. Other instances might include shared or distributed decision making, when the various parties recognize the situation disparately. In other cases, matching strategies may rule out a host of alternatives, but two or three viable options still remain. The broken line at the bottom of Figure 1 indicates that as situations become more complex, decisional processes adapt by becoming more deliberative. The nature and extent of the deliberation, as well as the strategy that is invoked in reaching a treatment decision, depends on the interests and expertise of the various parties, their relationships, the complexity of the case, and the adequacy of the informational base. In particular, step 3 may involve a strategy selection procedure, in which the parties to a decision select a course of action that is contingent on specific goals or progress assessment criteria (83).
Limitations of the guideline, procedure, and design have been noted in previous papers (61, 63). This paper discusses how the EBM model can facilitate future studies that address these limitations. First, as the study participants were psychiatric residents, it cannot be determined whether experienced clinicians would encounter similar difficulties in complex cases, or whether they would resolve these cases at the point of recognition — for instance, by expertise in identifying a specific piece of information and incorporating it into a simple matching strategy. Studies about the application of expertise in recognizing a decision situation, particularly in complex cases, can facilitate future guideline development and contribute to clinical training. The limited YPSA guideline can be supplanted by the more widely known and broadly disseminated TMAP (69), which features a more comprehensive and extensive progress assessment procedure and a richer set of treatment options. How trainee clinicians use the TMAP in complex cases, where the YPSA guideline has demonstrably limited value, is a subject of current work.
The value of even the most comprehensive and flexible guidelines may be limited in cases that require a combination of expertise and on-the-spot creativity (84, 85). In the complex environment of public mental health, clinicians are expected to contend with multiple psychiatric and medical co-morbidities, a host of legal and administrative complications, and competing interests among stakeholders and service providers. A question for future research is: How far should guideline development go in attempting to accommodate complex cases? For instance, should they add contingent or cascading assessment procedures, expand assessment criteria, or recommend regimens that include combinations of medication, psychosocial, and cognitive therapy (86–90)?Given the rapid development of basic knowledge and the anticipated success of implementation initiatives, how should guidelines be structured to anticipate future developments, and how detailed and flexible should they be?
A wealth of essays, studies, and policy statements have emphasized the importance of involving consumers in treatment decisions (91–93), identified impediments to implementing patient centered care (94–96), and proposed ways of overcoming them (97–101). The emerging literature raises the question, how can the desires and interests of patients—indeed, how can patients themselves—be included in treatment decision making processes that incorporate a systematic evidence base? Related questions include: Should factors related to how patients perceive and represent their conditions and preferences be incorporated directly into treatment guidelines? How can these factors be recognized in a manner that enhances quality of care? Our current research is addressing these questions by focusing on how the recognition of patients’ perceptions of illness affects clinicians’ decisions to recommend clozapine, which remains the most effective, yet under-prescribed, therapy for patients with treatment resistant schizophrenia (102).
The principle that underlies future investigations and prospective adaptations of the evidence-based decision making model presented here is that there is no “one-size fits all” explanation of what constitutes sound clinical decision making (7, 103). No overarching strategy is capable of accommodating the range and diversity of knowledge that clinical practitioners present, of addressing the wealth of clinical situations that are encountered in the routine course treatment decision making, and of responding effectively to the sundry constraints that limit treatment effectiveness—particularly, in cases of chronic conditions such as schizophrenia. Nonetheless, administrative, organizational, and policy making bodies have a legitimate interest in sanctioning treatments, gauging quality of care, improving access, and reducing cost. The question that remains to be answered is, how can evidence-based decision making assist the various parties to exert their influence in a manner that attenuates the current “evidence debate” (104)and facilitates the movement of knowledge into practice.
This study was funded by a grant from the National Institute of Mental Health to the lead author, 1R34 MH070871-01.
The authors wish to acknowledge Lee R. Beach, Ph.D., for his extensive guidance and support; Brent A. Moore, Ph.D., for his assistance with data analysis; and Robert M. Rohrbaugh, M.D., for his assistance in facilitating the project.
Paul R. Falzer, Yale School of Medicine, Department of Psychiatry, New Haven, Connecticut USA.
D. Melissa Garman, State of Connecticut, Department of Mental Health and Addiction Services, Southwest Connecticut Mental Health System, Bridgeport, Connecticut USA.