|Home | About | Journals | Submit | Contact Us | Français|
Good decisions depend on an accurate understanding of the comparative effectiveness of decision alternatives. The best way convey data needed to support these comparisons is unknown.
To determine how well five commonly used data presentation formats convey comparative effectiveness information.
Internet survey using a factorial design.
279 members of an online survey panel.
Study participants compared outcomes associated with three hypothetical screening test options relative to five possible outcomes with probabilities ranging from 2 per 5,000 (0.04%) to 500 per 1,000 (50%). Data presentation formats included a table, a “magnified” bar chart, a risk scale, a frequency diagram, and an icon array.
Outcomes included the number of correct ordinal judgments regarding the more likely of two outcomes, the ratio of perceived versus actual relative likelihoods of the paired outcomes, the inter-subject consistency of responses, and perceived clarity.
The mean number of correct ordinal judgments was 12 of 15 (80%), with no differences among data formats. On average, there was a 3.3-fold difference between perceived and actual likelihood ratios,95%CI: 3.0 to 3.6. Comparative judgments based on flow charts, icon arrays, and tables were all significantly more accurate and consistent than those based on risk scales and bar charts, p < 0.001. The most clearly perceived formats were the table and the flow chart. Low subjective numeracy was associated with less accurate and more variable data interpretations and lower perceived clarity for icon displays, bar charts, and flow diagrams.
None of the data presentation formats studied can reliably provide patients, especially those with low subjective numeracy, with an accurate understanding of comparative effectiveness information.
Provision of information to patients about the possible risks and benefits of proposed medical interventions has been an integral part of medical care since the doctrine of informed consent was adopted. In recent years it has become increasingly important due to changes in the accepted model of the doctor-patient relationship that promote more active patient involvement in decisions about their care, the rise of evidence-based medical practice, and the increasing emphasis placed on disease prevention. It is likely to become even more important as sophisticated information about future health risks becomes available through work in the “new sciences” such as medical genomics, metabolic profiling, and proteomics. (1) Richard Smith, former editor of the British Medical Journal, has called risk communication “the main work of doctors”. (2)
A realistic understanding of the differences in outcomes expected to result from alternative courses of action is an essential component of good decision making. Comparative effectiveness information can be provided to decision makers verbally, numerically, graphically, or using a combination of formats. (3–5) There is evidence that communication of quantitative data can be enhanced by using a format that conveys information about both the number of people with the outcome of interest and the size of the reference population (the “part to whole” relationship), includes a graphic representation of the data, and requires little or no additional processing by the recipient. (6–9) Current recommendations for the communication of information to support comparative assessments of decision alternatives suggest the use of graphic data displays that meet these criteria, including icon arrays, bar charts, flow diagrams, and risk scales. (3, 4) Significant knowledge gaps, however, still exist regarding how to convey information about outcomes that occur less than 1% of the time, whether graphic formats are more effective than numeric ones, whether any of the currently recommended graphic formats is superior to the others, and the impact of individual recipient characteristics. (6, 7, 10–12)
The goal of this study was to address these gaps. We used a comprehensive evaluation framework to compare the abilities of five common data presentation formats to accurately convey comparative effectiveness information across a representative range of clinically important likelihoods and consequences. We hypothesized that graphic displays meeting current formatting recommendations - bar charts and icon displays – would be the most effective formats. We also examined the relationships between communication effectiveness and recipient characteristics including age, gender, education level, literacy, and numeracy.
The study population consisted of members of an Internet survey panel who responded to a standard email invitation set by a host company (Zoomerang, MarketTools, Inc., San Francisco CA). Panel members who completed the study received points redeemable for goods and services from the company in return for their study participation.
The study intervention was a three part survey designed to compare the effectiveness of five data presentation formats for helping people accurately compare decision alternatives. We created a hypothetical disease screening scenario and then evaluated respondents’ abilities to compare information regarding the expected outcomes of three alternative screening tests.
The first part of the survey provided an overview of the study and introduced participants to the decision scenario. The decision scenario was described as follows: The questions in this survey refer to the risks associated with a serious disease and three screening tests for the disease. The disease and tests we refer to are not real. We are using them to test different ways of communicating information about the risks and benefits of screening. However, they are similar to the risks and benefits of screening for several real diseases.
The second part of the survey asked participants to interpret a series of outcome messages created by combining one of five screening outcomes with one of five data presentation formats. The five outcomes were: 1) the lifetime chance of developing disease; 2) the chance of dying from the disease; 3) the chance of a serious screening test side effect; 4) the chance of dying from a serious screening test side effect; and 5) the chance of a false positive screening test result. The likelihoods associated with each outcome are summarized in Table 1. They ranged from 2 per 5,000 (0.04%) to 500 per 1,000 (50%). The five presentation formats consisted of: a table, a “magnified” bar chart (13), a Paling risk scale (14), a frequency diagram, and an icon array. Examples of each format are shown in Figure 1. We tested the effectiveness of the outcome messages using a factorial design that divided the study population into five groups. As illustrated in Table 1, all groups were exposed to every data format and received the same total amount of information, but varied in the format and outcome combinations evaluated.
Every outcome message provided data about outcomes associated with three different screening options, labeled as Option A, Option B, and Option C. After reviewing the message, respondents were asked to compare outcomes associated with two of the three options and indicate if they were equally likely to occur or if one was more likely to occur. If one was judged more likely to occur, they were then asked to indicate the magnitude of the relative difference in likelihoods. To avoid biasing these judgments, we used an open ended question format supplemented with several examples of possible answers including whole numbers, decimals, and fractions. The respondents then repeated this process for the other two pairs of outcomes. For all comparisons, the message being interpreted was displayed on the same page as the questions. This process resulted in a total of three comparisons per message and 15 comparisons for each respondent. An example comparison question is shown in Figure 2.
In part three of the questionnaire, respondents rated each format on a one to ten clarity scale ranging from “confusing, very unhelpful” to “clear, very helpful”, completed a demographic background questionnaire, answered the one question literacy assessment developed by Chew and colleagues (15), and completed the subjective numeracy scale. (16) These measures were all chosen to make the study questionnaire suitable for a brief, 15 to 20 minute Internet survey.
The study was approved by the University of Rochester Research Subjects Review Board.
We compared the effectiveness of the data presentation formats using a modified version of criteria for evaluating risk communications proposed by Weinstein and Sandman (17):
We assessed ordinal accuracy by tabulating the number of correct judgments regarding the more likely outcome made by each respondent. Possible scores range from 0 to 15.
We assessed comparative accuracy using an adjusted comparative accuracy ratio. We first calculated the unadjusted accuracy ratio by dividing the perceived differences in likelihood by the actual differences for each paired comparison. We then determined the absolute values of the unadjusted ratios in order to assess errors due to both overestimates and under-estimates in a single measure. Finally, because each outcome message assessment task involved three separate comparisons, we used the mean of each respondent’s judgments for each set of comparisons to control for individual variation. Perfectly accurate judgments have an adjusted accuracy ratio equal to 1.0; inaccurate judgments have ratios greater than 1.0. To facilitate data analysis and interpretation, we transformed the data using base 10 logarithms for analysis and converted the results back into the original format for presentation when appropriate.
We assessed how consistently respondents interpreted the outcome messages using each data format by calculating the standard deviations of the mean adjusted accuracy ratios.
We used the relative percentage of inaccurate comparative accuracy judgments that were higher versus lower than the actual data to assess the relative frequencies of over versus under-estimates associated with each data presentation format.
We used the clarity scale to assess the respondents’ evaluations of each data presentation format.
We summarized the results using standard descriptive statistical methods. Because the outcome data are not normally distributed, we determined the statistical significance of outcome differences among the presentation formats using Kruskal-Wallis analysis of variance, corrected for ties, and the related between-group difference method proposed by Conover. (18)
Because we were unable to transform the data to a normal distribution, we measured associations between respondent characteristics and both ordinal and comparative accuracy by first dividing the respondents into low, medium, and high accuracy groups as evenly as possible given the distribution of the results and then using ordinal logistic regression to determine significant multi-variable associations. Respondent characteristics examined included age, gender, race, education, literacy, and numeracy. We excluded ethnic background because there was so little variation in the study sample.
Two hundred seventy nine people participated in the study. As shown in Table 2, most were white, well educated, literate, and fairly numerate. Compared with the general US population, our study sample was better educated, had relatively more men, and contained smaller proportions of African-American and Hispanic respondents. (21)
The ordinal accuracy data are illustrated in Figure 3. The overall number of correct ordinal judgments per respondent was 12 of 15 (80%). There were no statistically significant differences in ordinal accuracy among the different data presentation formats. Rates of correct responses ranged from 78% to 83%, p = 0.48.
The mean rates of correct responses in the low, medium, and high accuracy groups were 5.9 (39%), 13.2 (88%), and 15 (100%) respectively. Ordinal logistic regression indicated that three of the six patient characteristics studied - female gender, white race, and formal education beyond high school - were significantly associated with higher ordinal accuracy. These data are summarized in Table 3.
The comparative accuracy results are illustrated in Figure 4. Overall, the mean adjusted accuracy ratio was 3.3 (95% confidence interval 3.0 to 3.6) indicating a more than 3-fold difference between the perceived and actual differences in outcome likelihoods.
When the results across all five screening outcomes are combined, there is a statistically significant difference in comparative accuracy among the five formats, p < 0.001. Judgments based on the flow chart (mean adjusted accuracy ratio 2.8), the icon array (mean adjusted accuracy ratio 2.8), and the table (mean adjusted accuracy ratio 3.2) are all significantly more accurate than those based on the bar chart and risk scale, mean accuracy ratios 3.5 and 4.7 respectively.
There are also statistically significant differences in the data formats’ comparative accuracy relative to all five screening outcomes. These results generally reflect the overall findings except for the risk of a false positive screening test where the table is the worst format and the risk scale among the best. Further analysis of this result indicated a disproportionately high number of extremely discrepant values, defined as accuracy ratios > 100, for the table format: 47% versus 7% to 29% for the other formats. These results suggest that these respondents may have mistakenly reported absolute instead of relative differences. Repeating the comparative accuracy analysis with these extreme values removed results in findings consistent with the other four outcomes with the table among the most accurate formats and the risk scale significantly less accurate than the others (data provided in supplemental file).
There were 84 respondents in the high accuracy group, 79 in the mid-accuracy group, and 82 in the low accuracy group. Mean accuracy ratios in these three groups were 1.3, 2.3, and 12.4, respectively. Ordinal logistic regression revealed that male gender (Chi square 6.5, p = 0.01) and subjective numeracy higher than the median score of 4.5 (Chi square 8.9, p = 0.003) were significantly associated with better comparative accuracy. These results are summarized in Table 3.
The formats with the lowest response variation were the flow diagram (mean standard deviation 5.4) and the icon array (mean standard deviation 6.1). The risk scale (mean standard deviation 71.3) was associated with the most variability in responses, p < 0.05. On multivariable analysis, two respondent characteristics were associated with better response consistency: older age (p = 0.001) and a subjective numeracy score > 4.5 (p = 0.03).
Overall, there were a total of 1,005 inaccurate comparative accuracy judgments, equally divided into under-estimates (504) and over-estimates (501). There were no significant differences among the formats in the relative proportions of over and under estimates, Chi square= 8.7, p = 0.07. Younger respondents were more likely to overestimate the relative risk differences than older ones, Chisquare =7.2, p =0.03.
Table 4 summarizes the respondents’ format clarity ratings. The differences among the formats are statistically significant, p < 0.001. The table was the highest rated format (mean preference score 7.4) and the icon array was the lowest (mean preference score 4.6). Multivariable analysis shows that more education (p = 0.05) and higher subjective numeracy scores (p = 0.002) were associated with higher clarity ratings for the table format. Higher subjective numeracy scores were also associated with higher clarity ratings assigned to the icon display (p = 0.008), bar chart (0.005), and flow diagram (p = 0.03). Higher perceived bar chart clarity was also associated with race other than white (p = 0.03). None of the respondent characteristics studied were associated risk scale clarity ratings.
The goal of this study was to address current gaps in our knowledge of how to convey comparative outcome information needed to support meaningful patient involvement in decisions about their health and healthcare. The most notable finding is that the communication outcomes associated with all formats studied were suboptimal. The overall ordinal accuracy rate was 80%, indicating that one out of every five judgments incorrectly identified which of two outcomes was more likely, and did not differ among the formats. Similarly, the comparative accuracy analysis revealed a three-fold average discrepancy between participants’ perceived differences in outcome likelihoods and the actual differences in the data presented. Finally, none of the formats was judged to be particularly clear by the study participants: the average clarity ratings on the ten point clarity scale ranged from 4.6 to 7.4. These findings suggest that even the best conventional communication formats may have serious shortcomings in their abilities to convey quantitative comparative effectiveness information to members of the general population and most patients.
Despite their theoretical advantages, our results do not support the hypothesis that the bar charts and icon displays would be the most effective formats. Instead, across all outcome measures, the two most effective formats were the flow chart and the table.
Several previous studies have also demonstrated that tables are equal if not better than graphic data presentation formats for accurately conveying comparative outcome information. (22–25) Flow charts, also called natural frequency diagrams, have been shown to successfully communicate complicated quantitative information, but, to our knowledge, their effectiveness in communicating comparative effectiveness information relative to other commonly used presentation formats has not been examined previously. (9, 26)
In several previous studies, icon arrays have been found to be very effective risk communication formats over a limited range of outcome likelihoods. (27, 28) To our knowledge, this is the first study to specifically evaluate their effectiveness in supporting clinical decisions across a broad spectrum of clinically important outcome likelihoods. Consistent with previous studies, we found the icon array one of the more effective communication formats for outcome probabilities between 1% and 50%. They were much less effective in the two small risk scenarios involving likelihoods ranging from 0.04% to 0.3%. One possible explanation for this finding is the size of the array used. These small likelihoods necessitated the use of arrays consisting of 5,000 icons rather than the 1,000 icon arrays used in the other scenarios. Previous research has demonstrated that people’s interpretations of icon arrays are affected by the size of the array. (29) Icon arrays also received the lowest clarity ratings from the study participants. Previous studies have similarly found icon arrays to be disliked by patients. (30) Thus our findings provide new information about both strengths and weaknesses of icon arrays and indicate that additional work is needed to learn how to most effectively use them to support medical decision making.
Prior studies have found bar charts effective in helping people comprehend outcome information and preferred by many patients. (23, 31) Their effectiveness in this study, however, was mixed. The bar chart format was highly rated for clarity by study participants and the most consistently interpreted data format but was not as successful as several others in accurately conveying outcome information. Although we expected difficulties using the bar chart format to display the small risks (and for this reason included the magnified portion), the accuracy achieved by the bar charts was similar for low, medium, and high likelihood outcome scenarios.
Although the use of a risk scale to convey outcome information has been proposed, (4, 32–35) its usefulness for this purpose has not been thoroughly evaluated. To our knowledge, this is the first study to compare the effectiveness of risk scales in communicating comparative effectiveness information with that of other commonly used data presentation formats. Although the risk scale was ranked higher than bar charts and icon arrays in terms of clarity, it was the most inaccurately and inconsistently interpreted format. These findings probably reflect well-known difficulties with the proper interpretation of logarithmic scales. (7) (36) Alternatively, they could reflect the absence of comparative risk information that is frequently included in such scales.
Individual characteristics associated with accurate interpretations of outcome data in this study - gender, education, and numeracy – have also been found in previous studies. (10, 11, 37–39) The inconsistent gender effects we found between ordinal and comparative accuracy are difficult to reconcile and suggest that this finding may be an artifact caused by the disproportionate amount of missing data for the men with regard to comparative accuracy. The association between objectively measured numeracy skills and both data interpretation and use in decision making is becoming increasingly recognized. (10, 11, 38, 39) We found a similar relationship using a subjective numeracy measure. This result is consistent with an earlier study that examined how people interpret risk scales. (40) These findings suggest that, in addition to demonstrated skills in working with numeric information, people’s preferences for verbal versus quantitative information and their confidence in working with numeric information also play a role in how well they interpret quantitative comparative effectiveness data.
This study is subject to several limitations. First, because we used a hypothetical decision scenario, we are unable to determine if the results would differ if they were based on a real decision being faced by the study participants. However the scenarios we used addressed preventive screening, a situation that affects all members of the general population.
It is also subject to the limitations of Internet surveys including sampling bias and poor data quality. (41) These limitations, however, apply equally to all data presentation format studied. They are therefore unlikely to affect the relative differences found among them.
Other limitations include the moderate amount of missing data regarding the comparative accuracies and evidence suggesting a possible misunderstanding of survey instructions with regard to the false positive question. Given the number of comparisons studied and the uniformly high amount of inaccurate interpretations found, however, it seems unlikely that these problems significantly affect the main study results.
In conclusion, the results of this study suggest that none of the five data presentation formats studied can reliably provide patients with an accurate understanding of comparative outcome information across a wide range of clinically relevant likelihoods. They also provide additional evidence that structured numeric formats, such as a simple table or a flow chart, may work just as well as, if not better than, graphic displays for presenting small amounts of comparative outcome information and that icon displays are not well suited for presenting information about events with likelihoods less than 1%.These findings suggest that research is needed to develop more effective ways to communicate information to support meaningful patient involvement clinical decision making. Finally, our study provides new evidence that an individual’s perceived numeracy affects their ability to interpret comparative effectiveness information. This finding re-enforces the already recognized need to develop communication methods that are appropriate for supporting involvement of people with less formal education and low numeracy skills, both actual and perceived, in decisions about their health and health care. (10, 38, 42)
This study was supported by Grant number 1R21CA131760-01A1 from the National Cancer Institute.
James G. Dolan, Department of Community and Preventive Medicine, University of Rochester, Rochester, New York.
Feng Qian, Department of Anesthesiology, University of Rochester, Rochester, New York.
Peter J. Veazie, Department of Community and Preventive Medicine, University of Rochester, Rochester, New York.