|Home | About | Journals | Submit | Contact Us | Français|
Recent evidence suggests that patients are receiving only 50% of recommended processes of care. It is important to understand physician priorities among recommended interventions and how these priorities are influenced both intentionally as well as unintentionally.
A survey was mailed to all primary care physicians (PCPs) from two VA hospital networks (N= 289), one of which had participated in a broad, evidence-based guideline development effort 8 to 12 months earlier, and all endocrinologists nationwide in the VA (N= 213); response rate, 63% (n= 315). Using the method of paired comparisons, we assessed physician priorities among 11 clinical triggers for interventions in the management of an uncomplicated patient with type 2 diabetes.
Both PCPs and specialists consistently identified several high-impact clinical triggers for treatment as the highest priority interventions (hemoglobin A1c = 9.5%, diastolic blood pressure [DBP] = 95 mm Hg, low-density lipoprotein = 145 mg/dl). Several low-impact interventions that are commonly used as performance measures also received relatively high ratings. Treatments that have recently been found to be highly beneficial were often rated as being of low importance (e.g., treating when DBP = 88 mm Hg). Almost 80% of PCPs rated tight glycemic control as more important than tight DBP control, in direct contrast to clinical trial evidence. Specialists' ratings followed the same general pattern, but were more consistent with the epidemiological evidence. The PCPs at the sites that participated in the guideline intervention rated blood pressure control significantly higher.
Although several high-priority aspects of diabetes care were clearly identified, there were also notable examples of ratings that were clearly inconsistent with the epidemiological literature. Recommendations based upon more recent evidence were substantially underrated and some guidelines used as performance measures were relatively overrated. These results support the arguments that a more proactive approach is needed to facilitate rapid dissemination of new high-priority findings, and that intervention priority, and not just ease of measurement, should be considered carefully when disseminating guidelines and when selecting performance measures.
Recent studies have documented that barely half of the interventions that are recommended for patients with common conditions are actually provided.1–6 However, discussions of the “quality gap” sometimes do not acknowledge that for many common chronic diseases, such as coronary artery disease and diabetes, the number of interventions recommended by expert panels and guidelines is proliferating and threatens to overwhelm providers, not to mention potentially greatly surpassing the time and energy that patients are willing to expend on their health care. For example, Yarnall et al. determined that just to carry out the health care maintenance interventions recommended by the U.S. Preventive Services Task Force (USPSTF) for a panel of patients would take a physician close to 8 hours per day of practice, leaving no time for addressing any of the acute complaints or chronic disease management of their patient population.7 For a routine primary care visit of 10 minutes, the average patient has 15.4 risk factors and 24.5 recommendations based on the USPSTF guidelines.8 Because many people also have chronic diseases, the competing demands problem is actually much larger, even before considering acute injury and illness.9–15
Ultimately, new care models may be needed to significantly improve the number of interventions that it is possible to deliver. However, in the short run, we need to understand, and in some cases modify, the priorities that providers bring to clinical encounters so as to ensure that the most important interventions are not lost amid the blizzard of demands on patients’ and providers’ time and energy. There have been numerous attempts to develop prioritization schemes for health interventions, usually using economic cost-effectiveness models, as part of population-level resource allocation decisions,16–23 perhaps most famously the controversial Oregon Basic Health Services Act.24,25 Despite the controversy that surrounds some of these efforts, Coffield et al. argue that while all of interventions supported by evidence might be desirable, prioritization can allow for efficient step-by-step improvement in population health by devoting quality improvement resources to those interventions that have the greatest impact and value.16 In the absence of a rational prioritization of interventions, it is argued that provider decisions are based predominantly (and suboptimally) on marketing and tradition, “adding or subtracting at the margin according to some combination of new evidence, new technologies, patient or consumer demands, inertia, and internal politics.”26 While there is an extensive literature (cited in part above) on ways to develop priorities for interventions based on literature review and experts, relatively few studies have attempted to understand how practicing physicians prioritize clinical interventions.27–29
Therefore, we designed a study to elicit the relative priority given to 11 diabetes interventions. We selected diabetes because it is predominantly cared for by primary care physicians (PCPs), is commonly used to profile health plan and provider quality (such as Health Plan Employer Data and Information Set [HEDIS]), and has multiple guideline recommendations that span the spectrum from critically important and strongly evidence based to speculative and probably of low impact. We used the method of paired comparisons, an established method of ranking choices in the psychometric and economics literature, that simplifies prioritization tasks and produces more reliable estimates.30–34 We sought to evaluate physicians’ understanding and prioritization of 11 clinical interventions for specialists and for 2 groups of PCPs (one of which had received a multifaceted intervention trying to increase awareness of treatment priorities).
We developed a questionnaire that assessed priorities for diabetes care interventions. The questionnaire was mailed to all PCPs in two VA Integrated Service Networks (VISNs; n= 289) and to all endocrinologists in the VA health care system nationwide (n= 213 identified in the national VA employment files). Each service network has about 1.5 million annual outpatient visits and consists of up to 7 hospitals and 23 outlying freestanding community clinics. The freestanding clinics are distributed over a wide geographic area. Two reminder postcards followed an initial mailing for those who had not returned the initial questionnaire. The final response rate was 63% (n= 315). The survey was conducted in early 2001.
The paired comparison method has been used in many research disciplines, including health care,28,35–42 for over 100 years and is particularly useful when the differences between alternatives are not easily quantified by a single objective dimension such as weight, temperature, or size.30 This technique has been used to examine preferences for political candidates, abstract concepts such as social stability, and sensory inputs such as color or sound quality from hearing aids.
Using this method, respondents were first provided an information scenario that defines a simple and common patient presentation. Then they were presented with a series of pairs of possible clinical triggers for interventions selected from a list of 11. For each pair they were asked to select the clinical trigger that is “more important” to intervene upon (see Fig. 1). We did not define “important” for the physicians, because we wanted them to use their own values and criteria, not one(s) determined by us. The random utility interpretation of this model posits that subjects decide between the two options by determining the underlying utilities of the two options and then selecting the option with the higher utility.33 The analysis is designed to recover the strength of the population's preferences from the consistency of their choices (arising from their underlying utilities) versus the random fluctuations introduced by the heterogeneity of the subjects and the difficulty of the decision in choosing between the alternatives.
There are a number of theoretical advantages of the pair comparison method over other rating methods such as category-rating scales (i.e., rating importance on a 1 to 6 scale). The memory demands are less, and it requires a simpler internal conceptual model.35 The method makes complex judgment tasks easier and frees the judgment process as much as possible from contextual effects caused by the presence of other items.33 Whereas category-rating scales require subjects to express a preference on an arbitrary numerical scale, something they never do in making decisions in real life, paired comparisons ask whether you think treatment A is more important than treatment B.38 This is a more natural question than rating treatment A on a 1 to 6 scale. By presenting each intervention multiple times as part of different pairings, the calculated ratings based on multiple measurements are also more reliable than a single category rating.32 From the results of the pair comparisons the analysis can recover a representation of an interval scale that produces the observed winners of each pairing.
We assembled a series of possible clinical triggers for diabetes interventions (Table 1) By clinical trigger we mean a reason to prescribe a drug, order a test, or perform some other intervention. The triggers can be arranged into 4 groups: hypertension, hyperglycemia, lipid abnormalities, and lapses in screening. Thus, for example, one trigger would be a finding, in the patient described in the scenario in Figure 1, of a systolic blood pressure (SBP) of 150 mm Hg. This trigger represents a possible reason to prescribe a different blood pressure medication.
Measures were selected to give an assortment across the spectrum of degree of importance based upon the strength of the epidemiological literature and degree of benefit in preventing major complications. Based upon the epidemiological evidence,43–48 we a priori placed the 11 clinical triggers into 4 general categories (Table 2) The first we have labeled “high priority.” These included clear and substantial elevations in blood pressure, LDL-C, and HgbA1c. We felt that anyone who knows the literature well would agree that the interventions for these clinical triggers should be at the top of the heap, because there is good evidence of a large impact on major outcomes for intervening on these triggers (see Table 2).43,45
The second group we labeled “overappreciated” consisting of the lapses in annual screening for retinopathy and proteinuria.45,48 While screening at some interval is of value (especially screening for retinopathy every 2 to 3 years),48 there is relatively little evidence that annual screening has much benefit, especially for the patient described in this scenario, and even if they are beneficial the impact is certainly much smaller than those for group 1. However, interventions for these indications are easily monitored and part of most health plan “report cards.”
The third group, consisting of the single indication of a diastolic blood pressure (DBP) of 88, we labeled “recent-evidence” for high-impact; we were anticipating that this measure may be underappreciated, because the evidence of a large impact comes from relatively recent clinical trials (studies occurring about 2 years before our study).49,50 At the time of our survey, the evidence supporting glycemic control to modify microvascular risks in diabetics was almost 10 years old and heavily promoted through national organizations and guidelines, whereas the importance of interventions to prevent macrovascular disease in diabetics, particularly blood pressure control, were less heavily emphasized and a DBP of 88 had until 2 years before our study been considered “within normal limits.”43,45,49,50
The final group, consisting of mild hyperglycemia, is classified as having good evidence of a small (for later onset diabetes) to moderate (for early onset diabetes) benefit of controlling blood sugar further,46 and we label it as being of intermediate importance.
The pair comparison questions were administered by mailed questionnaire. The questionnaire asked for basic demographic information about the respondents and then presented the information scenario as outlined in Figure 1. The information scenario was the same for every subject with the exception that the age of the patient was randomly assigned to be 47 or 67 across the subjects, with age of onset of diabetes specified as 2 years earlier than the patient's age (i.e., 45 and 65 years old). Interventions related to preventing microvascular outcomes (glycemic control and screening for nephropathy and retinopathy) have much smaller absolute benefit in older onset diabetics.46,48 For example, the benefit of treating an A1c of 8% is estimated to be over 5 times more beneficial for a 45-year-old patient than for a 65-year-old patient. Thus, if clinician ratings of “importance” consider degree of patient benefit, then clinical triggers related to microvascular disease should be rated lower in older onset diabetics.
Our design included a comparison between different physician groups. Endocrinologists were sampled under the presumption that specialists would be more likely to incorporate new evidence about effective interventions into their treatment priorities.51 A quasi-experimental design varied the exposure of the PCPs to an educational intervention. The physicians in one of the two VISNs had participated in a broad, evidence-based diabetes guideline development effort 8 to 12 months before the survey was mailed. This intervention included a 2-day conference of clinical opinion leaders from all of the clinical sites in the VISN where the evidence supporting various diabetes interventions was reviewed and a consensus was reached regarding the promulgation of treatment priorities for special attention in the care of patients within the VISN. Subsequently, over a period of several months, an organizer of the conference visited each of the clinical sites in the VISN and presented the treatment priorities and the evidence supporting the recommended interventions in a discussion format with the primary care providers at each site. While presenting precise prioritization among alternative treatments was not a feature of these guidelines, there was a distinct emphasis on prevention of macrovascular disease through diagnosis and treatment of elevated blood pressure and LDL-C.
The analysis was carried out using a hierarchical logistic regression method that explicitly models the heterogeneity between respondents to analyze the paired comparison data (see Appendix online, for further details of the analysis. Available at http://www.jgim.org). The age of the hypothetical patient in the information scenario was introduced as a covariate to test whether the priority of the different interventions expressed by the subjects varied by the age of the patient for whom they were considering the interventions. Results were modeled separately for each of the 3 groups of physicians and specialists and the 2 groups of PCPs. The logistic regression model produces a scale value for each of the triggers relative to the lowest rated trigger. This scale value reflects the mean priority ranking (in terms of the log odds of being selected in preference to the lowest rank trigger) for the physicians in that group. The distance between any two triggers is the log odds of the probability of selecting the higher-ranked trigger over the lower-ranked trigger and the probabilities for the comparison of any two triggers can thus be calculated from the scale values.
Figure 2A shows the relative preference given to the 11 indications for the entire sample of physicians. Interventions for the triggers listed at the top of the scale are preferred over those at the bottom. The distance between two triggers on the y-axis can be interpreted as a probability of selecting one trigger over the other. When choosing between two triggers to intervene upon, a clinical trigger that is 1 point higher will be selected by a provider 73% of the time over the lower clinical trigger, and one that is 2 points higher will be chosen over the other 88% of the time. Two triggers at the same level are 0 units apart and each would be selected over the other with a probability of 50%. For example, in Figure 2A, an A1c of 9.5% is 1.8 points higher than an SBP of 150 mm Hg. Thus, when choosing between the two triggers, the average physician would be expected to choose to intervene first for the elevated A1c 85% of the time over an SBP of 150 mm Hg.
The relative position of the clinical triggers for the entire sample of physicians (Fig. 2A) is consistent with some of our hypotheses about the general groupings of the indicators. The more severe levels of hypertension, hyperlipidemia, and hyperglycemia are at the top of the scale. An HgbA1c of 8.0% is in the middle of the scale. The remaining clinical triggers are clustered at the bottom, with an HDL of 30 having the lowest rank. Further, the DBP of 88 mm Hg is in the lower part of the scale, especially when looking at the PCP comparison group (Fig. 2B), supporting the hypothesis that this “recent evidence” trigger would be underappreciated.
Figures 2B and 2C also show the ratings for the endocrinologist and “intervention” PCPs. There is clear evidence of significant heterogeneity in the rating scales between the 3 physician groups (likelihood ratio test χ2= 50; degrees of freedom = 20; P < .001). First, there is a greater spread in the scale for the PCPs than for the specialists, particularly for the intervention PCPs. In general, the specialists seem to classify the triggers into two groups, the more severe levels of hypertension, hyperlipidemia, and hyperglycemia, and then everything else.
In terms of specific differences, the specialists rate the lapses in screening lower than the “control” PCPs (P < .002 for both; arrows in Fig. 2B). Interestingly, they also rate an intervention for an Hgb A1c of 8.0% much lower than the control PCPs (P= .01). The intervention PCPs are notable for their significantly higher ratings given to systolic hypertension (P= .003) and both levels of diastolic blood pressure (DBP = 88, P= .01; DBP = 95, P= .05; arrows in Fig. 2C).
Finally, the age of onset of diabetes (45 years old vs 65 years old) did not change the magnitude of any of the relative priorities given to the triggers by the 3 physician groups. This finding is quite notable given that younger patients will, on average, get dramatically more benefit from tight glycemic control.46
In this study, we found that several high-priority aspects of diabetes care were clearly and consistently identified by all 3 physician groups, including the more severe abnormalities of hyperglycemia, hyperlipidemia, and hypertension. However, many of their other priority selections were highly inconsistent with the epidemiological evidence. In particular, all 3 physician groups prioritized interventions for an Hgb A1c of 8.0 higher than an intervention for a DBP of 88 mm Hg. This is very difficult to justify when comparing the moderate impact on microvascular outcomes achieved by intervening on an Hgb A1c of 8.0 compared to the large impact on cardiovascular outcomes and death found for the aggressive treatment of hypertension intervention.43,45,46,49,50,52 This disturbing result illustrates how slowly critically important clinical evidence can diffuse to even specialist physicians, as our study was conducted over 2 years after publication of the relevant clinically trials.49,50 In addition, based on the substantial benefits found with gemfibrozil treatment of low HDL in the VA HDL Intervention Trial study, some would argue that the trigger of an HDL = 30 mg/dl should be ranked higher.53 Even for specialists we may need to find a way to disseminate new information faster.
The relatively high priority that the PCPs gave to the eye and urine screening tests suggests another concerning phenomenon. The widespread practice of profiling providers using easily monitored, but often low-priority, aspects of clinical care may be distorting primary care provider priorities relative to those of specialists and the evidence in the epidemiological literature.45 For example, 70% of the time, the control PCPs chose to intervene for a lapse in urine screening for proteinuria over a DBP of 88 mm Hg. There are some theoretical reasons for annual proteinuria screening, but no evidence-based argument for it substantially impacting patient outcomes.43,54 In contrast, the Hypertension Optimal Treatment and UK Prospective Diabetes Study Group studies provided evidence that using 3 to 4 antihypertensive medications with a target DBP less than 80 mm Hg reduced cardiovascular mortality by 30% or more and also decreased visual loss and reduced cardiovascular events.45,49 Organizations such as the National Committee for Quality Assurance (NCQA) need to consider that selecting easily measured, but low-priority, interventions as performance measures may distort clinician understanding of treatment priorities.
A group of PCPs who underwent an educational intervention 12 months prior to the survey showed priorities that differed significantly from the control PCP group. The intervention appeared to result in a de-emphasis of the lapses in screening (although intervention PCPs still showed signs of possibly being influenced by performance measurement) and an increased emphasis on blood pressure triggers, which were one of the major topics of the educational intervention. The survey materials did not refer to the educational intervention and this, along with 12-month period of time between the intervention and the survey, suggest that this observation was not merely a transient effect of the intervention. However, the quasi-experimental design of the study does not allow us to draw firm conclusions about the causal relationship between the intervention and the observed differences in priorities between the 2 primary care groups.
We found no evidence that any of the physician groups gave different rankings based on the age of onset of diabetes of the patient. This is striking given that younger onset diabetes is associated with dramatically greater benefit from interventions that decrease microvascular complications (glycemic control and screening for nephropathy and retinopathy).46,48,55 Reducing a hemoglobin A1c from 9% to 7% is over 4 times more likely to prevent blindness (in terms of absolute risk reduction) in a patient with onset of diabetes at age 45 versus a patient with onset at age 65. In contrast, the absolute benefits of blood pressure reduction are substantial regardless of age of onset.45 Clearly one of the challenges of disseminating information about treatment priorities is to communicate the importance of incorporating individual patient risk and benefits into treatment priorities and not just “treat the number.”
Finally, this study demonstrates a method that can be used to elicit physician treatment priorities. Assessing physician treatment priorities could allow us to evaluate whether physicians have a general understanding of the clinical literature, especially regarding new findings, and could help identify important areas for educational interventions. Given that less than half of recommended interventions are actually provided,1–6 the priority given to different interventions is important. We would hope and expect that treatment priorities that a physician brings to an encounter would be modified by the preferences and particular circumstances of each individual patient. Yet we propose that these treatment priorities, elicited for a relatively generic patient scenario such as we presented, still form the base from which individualized decisions about interventions are made and should have a significant impact on the rate at which different interventions are employed.
This method could also be used to help prioritize guideline recommendations. Formal cost-effectiveness analyses have sometimes been used in an attempt to identify clinical priorities;16,23 however, conducting explicit cost-effectiveness analyses is time consuming and sometimes is not feasible (because of deficiencies in the available evidence). Further, when the results of cost-effectiveness analyses vary substantially with sensitivity analyses, there is no established way to simultaneously convey the importance of the estimated absolute risk reduction (if true) and our confidence that the estimate is true. In these cases, the best available standard may be for an expert panel to review the evidence and then rate treatment priorities using the type of method used in our study. Subsequently, the most critical high-consensus priorities in the clinical guidelines could be better understood and emphasized during dissemination efforts and performance measurement activities.
In conclusion, while showing that some clinical triggers used as performance measures were relatively overrated, our study also showed that even critically important triggers based on more recent evidence can be substantially underrated by both generalists and specialists. We also showed some preliminary evidence that a cooperative guideline development effort improved PCPs’ awareness of the importance of several underrated diabetes interventions. It is clear that we need a more proactive approach to facilitate rapid dissemination of new high-priority findings. But it is also important to consider that selecting performance measures based on availability of data or ease of measurement may distort provider priorities in ways that are counterproductive.
This work was supported by a grant from the VA Health Services Research and Development Quality Enhancement Research Initiative (QUERI-DM). The online appendices are freely available from the author.
This method is a two level extension of Luces’ paired comparison method known as the hierarchical paired comparison model.33 In this model the mean evaluation of the importance of an intervention for a population of subjects is estimated independently of the random effects due to the between subject variation. The data is organized so that each of the 28 pairs submitted to a subject represent an observation clustered within subject. The dependent variable is dichotomous reflecting which member of the pair was chosen. The data are analyzed with a hierarchical logistic regression model that can accommodate the incomplete pair design and accounts for the heterogeneity of the physician raters. A design matrix was coded so that we estimated ten parameters representing the mean evaluation of 10 out of the 11 triggers for interventions relative to an omitted 11th trigger. The mean evaluation of a trigger is estimated on a logit scale and represents the likelihood of choosing that trigger for an intervention relative to the omitted 11th trigger. As the trigger for a low HDL was the least likely to be chosen over any of the other triggers in the simple tables of probabilities, we used that trigger as the omitted reference category, against which the likelihood of choosing the other triggers was estimated.
With 11 clinical triggers there are a total of 110 possible pairings (n*[n–1]). In order to reduce respondent burden we randomly varied the order of presenting any two triggers across subjects. This reduced the number of possible pairs in half to 55. By employing an incomplete pair design we further reduced the number of pairs presented by half again, randomly selecting 28 pairs to present to each subject. Based on feedback from pre-testing this was an acceptable number of items to complete taking 5–7 minutes.
While the model takes into account the random differences between physician raters it does not directly estimate these variances for each trigger. With some assumptions allowing the use of identification constraints in a transformation of the estimated covariance matrix it is possible to recover a covariance matrix of the 11 triggers directly.33 This matrix was examined but the covariance pattern did not suggest any structure that contributed further to the understanding of the observed results. The models were estimated using Laplace approximations of the maximum likelihood as for binary outcomes in HLM 5.0, there is some evidence that these estimates are more accurate than the quasi-likelihood estimation and more computationally efficient than the simulation estimations used in MlwiN.33
While Figure 2 contains all of the information in the model, it can be difficult to appreciate the relative differences in distance between the 55 possible pairs of triggers for each physician group and the impact of those differences on the probability of choosing one clinical trigger over another. The entire pattern can be visualized on a more natural scale by transforming the relative distances back to a probability scale (Fig. 3).