|Home | About | Journals | Submit | Contact Us | Français|
In the choice and definition of quality of care indicators, there may be an inherent tension between feasibility, generally enhanced by simplicity, and validity, generally enhanced by accounting for clinical complexity.
To study the process of developing quality indicators using an expert panel and analyze the tension between feasibility and validity.
A multidisciplinary panel of 12 expert physicians was engaged in two rounds of modified Delphi process to refine and choose a smaller subset from 36 indicators; these were developed by a research team studying the quality of care in ambulatory post-myocardial infarction patients with co-morbidities. We studied the correlation between validity/feasibility ranks provided by the expert panel. The correlation between the quality indicators ranks on validity and feasibility scale and variance of experts' responses was also individually studied.
Ten of 36 indicators were ranked in both the highest validity and feasibility groups. The strength of association between validity and feasibility of indicators measured by Kendall tau-b was 0.65. In terms of validity, a strong negative correlation was observed between the ranks of indicators and the variability in expert panel responses (Spearman's rho, r = −0.85). A weak correlation was found between the ranks of feasibility and the variability of expert panel responses (Spearman's rho, r = 0.23).
There was an unexpectedly strong association between the validity and feasibility of quality indicators, with a high level of consensus among experts regarding both feasibility and validity for indicators rated highly on each of these attributes.
The use of more effective therapies, based on strong evidence, can decrease morbidity and mortality from coronary heart disease and myocardial infarction (MI). For example, if all heart attack survivors receive timely beta-blocker therapy, an estimated of 1500 deaths could be avoided each year . Although most physicians are aware of evidence-based clinical practice guidelines, promulgation of guidelines does not necessarily translate into their application in clinical practice [2, 3]. Several barriers to increase the adherence toward clinical practice guidelines have been identified, for example, expectations about outcomes with guidelines and the inertia of previous practices . Additionally, patients with chronic disease, such as ischemic heart disease, have other co-morbidities, which further complicate the process of care that can be applied uniformly. Also, the movement towards evidence-based medicine has resulted in an intense proliferation of clinical practice guidelines, with thousands now available.
Indicators of quality of care represent measurable aspects of care that reflect key components of a guideline. An indicator is an observable trait or variable that is assumed to point to the assessment of some other trait, usually difficult to observe directly . Increasingly, quality indicators are being developed to assess different dimensions of care as adherence to clinical practice guidelines, physician performance and quality of care for specific conditions like acute MI , urinary tract infections  and epilepsy . Each guideline can be used to develop multiple indicators based on that guideline.
An ideal quality of care indicator would be valid, reliable, sensitive, specific and feasible [9, 10], among other characteristics related to the importance of the construct being measured. Validity can be understood as the measurement under consideration corresponding to the true condition of the event being measured. However, because a quality indicator reflects a minimal acceptable standard of care, validity also relates to the degree of relevance of the proposed recommendations (clinical practice guidelines). Feasibility refers to the ease with which the quality indicator can be measured accurately; for example, the frequency of prescriptions of beta-blockers for patients with MI is highly feasible if there is a computerized order entry system that allows flexible search inquiries .
Conceptually, there is an overlap between validity and feasibility. However, increasing validity may increase complexity of measuring, and thus render the indicator more difficult to measure, thus decreasing feasibility. Conversely, easily measurable indicators may ignore clinical complexity and raise questions regarding their validity. One study showed that feasibility rankings were not associated with the overall utility rankings of quality indicators for Parkinson's disease . Another study reported a negative correlation between feasibility and ‘room for improvement’ of quality indicators for stroke . Development of quality indicators is an arduous task, and to understand the relationships among the characteristics of quality indicators is helpful. Also, the validity and feasibility of a particular guideline have been shown to predict implementation of the guideline in clinical settings [13, 14]. Thus, understanding the tension between feasibility and validity should help in finding the reasons why some clinical practice guidelines are implemented more effectively than others in clinical practice.
MI-Plus is a group-randomized trial funded by the National Heart Lung and Blood Institute (NHLBI) that developed and tested an intervention to increase physician adherence to guidelines for complex post-MI patients with co-morbidities. As the backbone of this intervention, valid quality indicators for use in feedback and performance measurements were developed. This manuscript describes this process and examines the tension between feasibility and validity in the selection of quality of care indicators.
In 2003, before identifying appropriate quality indicators, the MI-Plus research team reviewed available guidelines for the management of ischemic heart disease as well as for the co-morbidities that are common in post-MI patients, including hypertension, hyperlipidemia, diabetes mellitus, heart failure, renal disease, stroke, peripheral vascular disease and depression. In all, 22 guidelines and position papers from recognized entities were used (Table 1). Disease-specific guidelines along with the level of evidence for these co-morbidities were then organized into a summary statement. American College of Cardiology and the American Heart Association (ACC/AHA) guidelines were the primary source for this summary, but we used other guidelines as well. For example, to address lipid management, guidelines from the National Cholesterol Education Program (NCEP) ATP III guidelines were preferred over ACC/AHA guidelines as these guidelines covered a broader aspect of management of patients with lipid disorders.
Key recommendations were selected from the previously reviewed guidelines. Based on the available literature, the level of evidence was graded for each recommendation to determine its scientific soundness. Then the research team selected processes of care that exert a large impact on the population and manifest demonstrable variations of care, thus revealing potential for improvement of quality of care of post-MI patients with co-morbidities. The indicators were derived from these processes of care that respond to recommendations that are evidence based, actionable, and under the control of the physician or health-care organization. Quality indicators were constructed using an ‘IF-THEN-BECAUSE’ format based on the methodology used in studies by Shekelle et al.  and MacLean et al. . IF refers to the clinical characteristics that describe a person's eligibility for the quality indicator; THEN indicates the actual process of care that should or should not be performed and BECAUSE refers to the expected health impact if the indicator is performed. For example, IF a patient has had an MI and is not on aspirin AND no contraindications are present, THEN aspirin should be prescribed at a level of 75–325 mg/day BECAUSE aspirin inhibits platelet aggregation and significantly reduces the risk of reinfarction and mortality in post-MI patients.
Thirty-six quality indicators were drafted. These indicators were presented to the panel of expert members as quality measures focusing on standard processes of care. Hence, in the numerator, the appropriate process of care was included, and in the denominator, only patients with clear indications and no contraindications for treatment or recommendations should be included. Although the research team paid attention to proper specific numerator, denominator, sampling methodology, ease interpretation, risk adjustment and economical and logistic data, it was not possible to provide specific information about reliability, sensitivity, specificity or specific auditing techniques for some indicators. Information about the evidence rating or classification (i.e. Classes I, II, III and A, B, C) for each recommendation was provided along with pertinent references. Moreover, a summary document provided all this information as well as information about the methodology used to create each indicator. These materials are available upon request. Table 2 shows the list of abbreviated quality indicators (in terms of processes of care) that were proposed for review by the expert panel.
The expert panel was composed of 12 practicing physicians. The 12-member panel included experts in cardiovascular disease, community practitioners and academic physicians from different geographical areas of the USA. The primary purpose of the expert panel was to provide external validation of the study team's selection of quality indicators for use in our MI-Plus study's performance measurement system. The expert panel had three primary tasks (i) To review a set of quality indicators; (ii) to rate the indicators on a set of validity and feasibility criteria and identify a final set of indicators; and (iii) to assist the study team in operationalizing the final panel of quality indicators for use in performance measurement.
During summer and fall of 2003, two rounds of the modified Delphi process were used to obtain the validity and feasibility of the proposed indicators using the RAND/UCLA appropriateness method . Round 1 involved rating the indicators on validity criteria alone and also collecting information on barriers to implementation. The validity of quality indicators was rated using a 9-point scale with higher numbers indicating greater validity. The validity criteria proposed to the expert panel was defined by: (i) adequate scientific evidence or professional consensus supported a link between the performance of that indicator and overall positive outcome for post-MI patients; (ii) a physician with higher rates of adherence to that indicator would be considered a higher quality provider; and (iii) the physician or health plan influenced most factors that determine adherence to the indicator [11, 15, 18].
Round 2 focused on areas of disagreement about the validity of indicators, refining the indicators to improve specificity, and ranking (not rating) the indicators on feasibility criteria. We considered an indicator to be feasible if (i) information on adherence to the indicator is expected to be available and documented in the medical record; OR (ii) information on the indicator is expected to be available from patient or proxy surveys or interviews and is likely to be accurate; AND (iii) information collection is of reasonable cost and does not impose an inappropriate burden on health-care systems [11, 18, 19]. Each expert was asked to divide the 36 indicators into 3 groups (12 in each) based on their perceived feasibility in clinical practice, i.e. most feasible, moderately feasible and least feasible. The panel was then asked to further rank the quality indicators in these three groups, from highly to least feasible, thus conferring a rank between 1 and 36 to the feasibility of each indicator. The results from round 2 were summarized and fed back to the panel again.
After round 2 feedback, as above, final values of average validity rating and feasibility ranks provided by the expert panel were tabulated. Determining the validity scoring took two steps. The first step was to average the validity rating of each indicator. The second step was to rank each average validity rating from 1 to 36. Determining the feasibility scoring took only one step: to determinate the average rank for each indicator and the rank order list.
Kendall's tau-b coefficient was used to estimate the strength of association of expert panel responses on the validity and feasibility rank scale. Since the quality indicators were ranked on the validity and feasibility scale, we used Spearman's rho to estimate the correlation between the quality indicators ranks on validity and feasibility scale and variance in expert panel responses. To understand the expert panel preferences for quality indicators, validity average rating and feasibility average ranking were individually plotted against the variance in expert panel responses for each indicator.
Final validity and feasibility average ratings and their corresponding ranks are shown in Table 2. Table 3 displays the cross-tabulation of final validity and feasibility categorized into tertiles of rank. The upper right-hand corner displays the 10 indicators that were ranked in both the highest validity and feasibility groups. An additional three indicators were ranked having high validity and medium feasibility. These 13 indicators were identified as most relevant for the care of post-MI patients with multiple co-morbidities (description of indicators in Table 4).
The strength of association between validity and feasibility ranks measured by Kendall tau B was 0.65 (P < 0.001), suggesting a good association between overall validity and feasibility of quality indicators. In terms of validity, a strong negative correlation was observed between the ranks of quality indicators and the variability in expert panel responses (Spearman's rho, r = −0.85, P < 0.001) (Fig. 1). By contrast, the correlation between the ranks of feasibility and variability of expert panel response was weak (Spearman's rho, r = 0.23, P > 0.05). Figure 2 shows an inverse ‘U’-shaped like curve suggesting the divergence of opinion was smallest on the ends of the feasibility spectrum and greatest in the middle.
We describe the process of finalizing quality indicators for improving the care of post-MI patients. There was a surprisingly strong correlation between the overall validity and feasibility. Furthermore, there was an increased variance in the expert panel responses as the rating of quality indicators decreased, in terms of validity. Regarding the feasibility of indicators, there was more variance in the middle of the ranking scale, signifying more consensus in the expert panel response for the quality indicators on both ends of ranking scale.
Through a consensus method, a modified Delphi process, our expert panel identified 13 quality indicators as valid and feasible for the care of post-MI patients. The 13 quality indicators addressed different domains of post-MI care and helped prioritize the quality of improvement efforts. Several studies [6–8, 11–12, 15–16, 20, 21] have used a similar approach in working with expert panels to examine both the validity and feasibility, but very few studies have attempted to analyze the association between the attributes of quality indicators. Holloway et al.  found that the feasibility ratings of quality indicators were not associated with their overall utility rating. In a more recent study, a normative criterion was used to choose indicators. Each panel member was asked to rate each indicator on a 3-point scale: ‘do not include,’ ‘could include’ and ‘must include’ ; however, the relationship between this criterion and validity or feasibility was not analyzed. To the best of our knowledge, this is the first study that evaluates the association between validity and feasibility of quality indicators.
The surprisingly high level of association between validity and feasibility appears counter the postulated tension between these two attributes. Our findings suggest that, for the members of our expert panel, both constructs were not completely bi-dimensional or independent, even when specific instructions for validity and feasibility were offered. In fact, notwithstanding our instructions and information to the panel, there may have been confusion in panel members' minds between validity and feasibility, with a tendency to equate a certain lack of definition in the less valid indicators with less feasibility. For example, the assessment of patient adherence in uncontrolled hypertension (QI22, Table 2) had final validity and feasibility ranks of 34 and 33, respectively. Yet, it would appear to be a clearly important, and thus a valid measure of the quality of care to perform such assessments. The known difficulty in assessing patient adherence (low feasibility) might have been conducive to its very low rating on the validity scale as well.
There was high level of agreement among our experts in considering an indicator as highly valid but less agreement for indicators perceived as having lower validity. There was a great deal of consensus for validity among the panel for the quality indicators that were higher on the rating scale and considerable variability in their responses for the variables that were lower in the rating order (Fig. 1). This relationship could be attributed to the level of scientific evidence showing the efficacy of the processes of care and recommendations used to derive the indicators. This may support the intuitive notion that validity is an attribute more dependent on clinical practice guidelines and levels of clinical evidence. As we can see in Table 2, the level of evidence for the majority of the 13 indicators that ranked highly is A; however, for the low-ranking indicators, the level of evidence was not clearly stated.
The expert panel evaluation of feasibility showed a consensus on the highly feasible and least feasible quality indicators, suggesting that agreement is easily reached on the clinical decisions that are simple to do or simple to ignore. However, the indicators ranking in the middle of the feasibility scale were widely dispersed with respect to the expert panel response, indicating greater uncertainty. Besides, the ‘U-like’ shape of Fig. 2 suggests that there is not a monotonic correlation. The overall correlation between the level of agreement (i.e. the variability in the expert panel answers) and feasibility was weak (Spearman's rho, r = 0.23), additionally, we were unable to prove that it was statistically significant (P > 0.05). These findings may suggest that feasibility is a construct difficult to define and assess even for a panel of experts. We speculated that this might be partially due to the fact that feasibility of quality indicators are not scientifically measured, and therefore, most of the experts' rankings may be based on views, beliefs and common sense. Furthermore, it is plausible that feasibility is an attribute more dependent on operational issues related to the specific quality indicators derived from the clinical practice guidelines and, as such, may be more context specific. This implies important consequences: if the level of agreement of an expert panel assessing the feasibility of quality indicators is low, future studies must pay more attention to the feasibility construct by elaborating more detailed criteria and involving professional panels with great experience in collecting data and evaluating the implementation of quality indicators. Furthermore, in the future, it will be necessary to empirically test, after implementation, the feasibility of the selected indicators.
The selection of quality indicators occurred several years ago during the implementation of the MI-Plus project; nevertheless, new evidence has been generated since then, thereby potentially causing changes in the choices for our final set of indicators. For example, a new guideline for the detection and management of post-MI depression has been recently proposed . One of the main recommendations in this guideline indicates that MI patients should be screened for depression using a standardized depression symptom check list at regular intervals during the post-MI period; this recommendation is based on level A scientific evidence. However, in our study, the quality indicator intended to asses for the risk of depression was not ranked high enough to be considered for our final set of indicators. With the new scientific evidence mentioned above, experts could plausibly rank this indicator for depression higher, which would change our final choices for the set of indicators. In terms of creating a quality indicator, while challenging the continuous flow of information and the production of scientific evidence, our set of indicators cannot adjust well to the dynamic nature of these measures. Additionally, although the members of the expert panel in this study included not only specialists but also generalists, all members were physicians. Experts in collecting data might provide different views regarding the feasibility of the indicators.
Almost all the selected quality indicators by the research team were rate-based indicators, which can be expressed as proportions of desired events; only one indicator (use of calcium channel blockers in congestive heart failure) was of sentinel type, indicators which identify undesirable actions . What the tension is between validity and feasibility in the case of sentinel indicators is something this study cannot answer.
The methodology in this study was used to identify relevant processes of care, which affect outcomes based on scientific evidence. However, a quality indicator is not an exact synonymous with a process of care; therefore, the further elaboration of quality indicator is recommended: to write the measure specifications (unit of analysis, data collection specifications, and define them mathematically). How this can affect an expert's understanding of feasibility is not completely addressed by this study.
Methods to use similar data regarding validity and feasibility to differentiate dominant and weak quality indicators for medical conditions others than MI need further research. It is possible to select a small group of indicators if we clearly know the validity, feasibility, utility, sensitivity, specificity and reliability of quality indicators. We could save time and resources if we knew whether the number of indicators affects validity. Finally, in addition to asking a panel of experts, more knowledge about these properties can be gained by testing the indicators in specific settings and with explicit methodologies. Empirical research to evaluate indicators can provide information regarding feasibility and can help to improve them.
The development and the use of quality indicators pose a number of interesting methodological problems. Acquiring a better understanding of the relationship among the different attributes of quality indicators is urgent because ever more quality indicators are created to assess physician performance and quality of care, the measurement of which can become very resource intensive. We demonstrated a strong association between the validity and feasibility of quality indicators, which countered our intuitive expectation of the underlying tension in achieving a perfect quality indicator that ranked equally on both the validity and feasibility scale. This study shows increase variance in the expert panel rating low-value quality indicators and in the middle of ranking scale for feasibility. Future research should focus on further exploring the importance of correlation between validity and feasibility of indicators and developing the indicators based not only on the validity of guidelines but also on their feasibility in translation.
This project was funded in part by grant number R01 HL70786 from the National Heart, Lung and Blood Institute. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Heart, Lung and Blood Institute or the National Institutes of Health.
The authors thank all the physicians, members of the MI-Plus expert panel, who made this study possible. We also thank Dr. Joshua S. Richman assistant professor of Medicine and Biostatistics, Division of Preventive Medicine, University of Alabama at Birmingham, and Dr. Ellen Funkhouser, associate professor, Department of Epidemiology, University of Alabama at Birmingham for helpful comments and suggestions on an earlier draft of our manuscript.
At the time the MI-Plus study and this analysis were conducted, Drs. Kiefe and Allison were affiliated to the Center for Outcomes and Effectiveness Research and Education and the Department of Medicine at the University of Alabama at Birmingham.