|Home | About | Journals | Submit | Contact Us | Français|
Low numeracy is pervasive and constrains informed patient choice, reduces medication compliance, limits access to treatments, impairs risk communication, and affects medical outcomes; therefore, it is incumbent upon providers to minimize its adverse effects.
We provide an overview of research on health numeracy and discuss its implications in clinical contexts.
Low numeracy cannot be reliably inferred on the basis of patients’ education, intelligence, or other observable characteristics. Objective and subjective assessments of numeracy are available in short forms and could be used to tailor health communication. Low scorers on these assessments are subject to cognitive biases, irrelevant cues (e.g., mood), and sharper temporal discounting. Because prevention of the leading causes of death (e.g., cancer and cardiovascular disease) depends on taking action now to prevent serious consequences later, those low in numeracy are likely to require more explanation of risk to engage in prevention behaviors. Visual displays can be used to make numerical relations more transparent, and different types of displays have different effects (e.g., greater risk avoidance). Ironically, superior quantitative processing seems to be achieved by focusing on qualitative gist and affective meaning, which has important implications for empowering patients to take advantage of the evidence in evidence-based medicine.
In this era of evidence-based medicine, patients and their health care providers are inundated with numbers. Whether it is direct-to-consumer advertising proclaiming that a clinical study of the latest cholesterol-lowering drug found 42% fewer deaths from heart attack, or a medication prescription instructing patients to take half a 10-mg tablet twice a day, or a newspaper article summarizing changes in cancer incidence, people cannot seem to escape having to deal with numbers in order to make informed decisions about health. Like literacy, numeracy is a necessity for daily life. It is especially critical in the health domain, where understanding or not understanding what numbers mean may have life-altering consequences. For example, numerical competence is needed to understand and weigh the risks and benefits of treatment, to decipher survival and mortality curves, and to navigate medical insurance forms and informed consent documents. Now that more and more people are seeking medical information from the Internet, numeracy is especially critical. The National Cancer Institute's 2005 Health Information National Trends Survey, which comprises a nationally representative sample of the general adult population, found that nearly 50% of all people seeking information about cancer went to the Internet as their first source of information, whereas only 23% of people initially sought information from their health care provider . On the one hand, this is encouraging news: people are becoming more informed health care consumers. On the other hand, how do we know that people understand the numerical information they see online? There are now websites that allow you to calculate your risk of developing various types of cancer, diabetes, cardiovascular disease, and osteoporosis. For example, if a woman wants to know her risk of developing breast cancer, she can search for “breast cancer risk,” click on the link for the National Cancer Institute Breast Cancer Risk Assessment Tool, answer six simple questions, and find out her 5-year and lifetime risk of developing breast cancer. But exactly what do these numbers mean? Is a 9.2% lifetime risk of developing breast cancer reassuring or worrisome? Herein rests the challenge of numeracy for behavioral scientists and clinicians: how to make numbers transparent and comprehensible so that individuals can make intelligent, reasoned decisions about their health.
Numeracy, defined in the broadest sense, is the ability to comprehend, use, and attach meaning to numbers. Also referred to as quantitative literacy, numeracy is often subsumed under the broader construct of literacy. Both the National Adult Literacy Survey  and the National Assessment of Adult Literacy  operationalize literacy into prose, document, and quantitative domains. Whereas prose literacy involves the ability to extract information from prose texts, both document and quantitative literacy require facility with numbers. Document literacy involves the skills needed to search for and use text and numeric information contained within documents, such as computing calories from nutrition labels or interpreting tables and graphs. Quantitative literacy involves the ability to apply arithmetic operations using numbers in printed materials. Balancing a checkbook, computing tax on an order form, and calculating the cost of an item that is reduced by 25% are all examples of quantitative literacy tasks. More specifically, health literacy refers to the “capacity to obtain, process, and understand basic health information and services needed to make appropriate health decisions .” Health literacy and numeracy encompass those cognitive skills that enable individuals to act on health information in ways that promote good health [5, 6]. More recently, researchers and clinicians have focused on distinguishing health numeracy from health literacy. Golbeck and colleagues  define health numeracy as “the degree to which individuals have the capacity to access, process, interpret, communicate, and act on numerical, quantitative, graphical, biostatistical, and probabilistic health information needed to make effective health decisions.” Ancker and Kaufman  suggest that we look beyond individual quantitative skills and conceive of health numeracy as the “productive use of quantitative health information.” In their conceptualization, health numeracy goes beyond the individual's basic quantitative skills (i.e., computation, estimation, statistical literacy) to encompass the ability to use numeric data contained in documents and graphics, and oral communication skills.
While these descriptions of numeracy are helpful, and represent current thinking, it is important to distinguish such descriptions from empirically confirmed dimensions of cognition. In particular, components of these definitions (e.g., productive use of quantitative health information) that are attributed to “numeracy” could be due instead to education or general intelligence. However, research has demonstrated that numeracy has effects on decisions that are independent of education , verbal intelligence , and health literacy , thereby supporting the validity of numeracy as a separate construct . In this paper, we provide an overview of health numeracy and discuss its implications for providers and patients in clinical contexts.
The 2003 National Assessment of Adult Literacy, which surveyed a nationally representative sample of adults, found that 22% of adults performed at a below basic quantitative literacy skill level, 66% performed at a basic or intermediate skill level, and 13% performed at a proficient level . Individuals at the lowest skill level could not perform simple, one-step arithmetic operations. Equally dismal functional health literacy findings were reported by Williams and colleagues: 22.0% to 61.7% of patients attending two urban, public hospitals had inadequate or marginal functional health literacy . A substantial number of patients did not understand instructions on medication bottles, could not determine when their next appointment was scheduled, and could not determine their eligibility for financial assistance. In short, these patients were unable to perform the most basic tasks required to function in the health care environment. Not surprisingly, average health literacy was correlated with level of educational attainment . However, one should not presume a high degree of numeracy based on years of education or an advanced educational degree. For example, Rothman and colleagues  studied comprehension of food labels among 200 generally well-educated primary care patients. Although 77% of patients had at least high school-level literacy skills (as assessed by the Rapid Estimate of Adult Literacy in Medicine (REALM)) , 63% of patients lacked ninth grade mathematics skill (as assessed by the Wide Range Achievement Test (WRAT-3) . In another study involving 357 university clinic patients, more than half of the participants who had at least some college education answered only one or no questions correctly on the three-item general numeracy assessment developed by Schwartz and colleagues [17, 18]. Studies have shown that physicians , medical students , and well-educated laypersons  often have difficulty performing relatively simple arithmetic calculations and comprehending numeric risk estimates, regardless of whether they are expressed numerically (e.g., as percentages) or graphically (e.g., as survival or mortality curves). For example, Yamigishi  found that undergraduates had difficulty interpreting simple mortality data: students rated cancer that killed 1,286 out of 10,000 people as riskier than cancer that killed 24.14 out of 100 people. Lipkus et al.  found that well-educated individuals had difficulty answering simple questions regarding probability, percentages, and proportions: only 20% of subjects were able to convert “1 in 1,000” to a percentage. Finally, Reyna et al.  showed how physicians and patients had great difficulty understanding genetic risk and were prone to similar errors, despite vast differences in medical knowledge. For example, both groups confused the probability of disease given a genetic mutation with the converse probability.
Particularly problematic for physicians and patients is understanding risk estimates used to summarize the risks and benefits of medical interventions: relative risk reduction, absolute risk reduction, and number needed to treat. This is worrisome because studies have shown that treatment preferences may be influenced by which of these summary statistics is used [19, 24–30]. For example, one study found that physicians were more inclined to treat patients when the results of a clinical trial were reported in terms of relative risk reduction than when the same results were reported in terms of absolute risk reduction . In this study, physicians were presented with actual clinical trial outcome data that were summarized as either absolute or relative changes in mortality. Ninety-seven of the 235 (41%) surveyed physicians indicated that they would be more inclined to treat with the study drug when results were reported as a relative change than when results were reported in terms of absolute change. Laypersons’ decisions about medical treatment appear to be similarly influenced by the format in which benefits are presented. One study found that the lay public was significantly more receptive to cancer screening tests when the benefits of screening were expressed as a relative risk reduction than when benefits were presented in terms of absolute risk reduction or number needed to screen .
Low numeracy has been associated with a number of undesirable health outcomes [31, 32], including self-reported poor health , health disparities , poor health knowledge and disease self-management skills , and choosing lower-quality health options . In other studies, individuals with inadequate health literacy, as assessed by the Test of Functional Health Literacy in Adults (TOFHLA)  or the S-TOFHLA , were more likely to be hospitalized and less likely to use clinical preventive services than those with adequate health literacy [38, 39]. Low numeracy has also been associated with inferior disease management. For example, in a prospective cohort study of patients taking warfarin, low numeracy was significantly associated with poor anticoagulation control . In a study of asthma patients treated with inhaled steroids, low numeracy was associated with a history of hospitalizations and emergency room visits for asthma . Although the ability to understand and interpret nutrition labels is not a health outcome per se, it has important implications for health and disease management. Rothman and colleagues  found that comprehension of nutrition labels was significantly associated with numeracy skill—as assessed by the WRAT —and that this association held after adjusting for education and income. Similarly, medication refill adherence is not a health outcome per se but is a critical component of disease management. In a prospective study of Medicare patients with cardiovascular-related diseases, low health literacy, as assessed by the S-TOFHLA, was associated with poor medication refill adherence, although this relationship did not achieve statistical significance .
Numeracy also has important health policy implications in that utilities are used to incorporate patient values into pharmacoeconomic analyses and health policy. A utility score quantifies a patient's value or preference for a particular health state . Utility assessment techniques, which include the standard gamble and time trade-off, are inherently quantitative tasks that require comparing probabilities. While assessment techniques have been widely used to assess health-related quality of life for a variety of chronic and acute health states , their validity for low-numerate individuals is questionable. Woloshin and colleagues  asked 96 women how they valued their current state of health by using the standard gamble and time trade-off techniques. One would expect that a higher utility for current health would correlate with better health. However, they discovered that among low-numerate women, the correlation between utility for current health and self-reported health was in the wrong direction, whereas for high-numerate women, the correlation between utility for current health and self-reported health was in the expected direction. Schwartz and colleagues  asked a sample of head and neck cancer patients to complete three utility assessment exercises: the standard gamble, the time trade-off, and a rating scale. They found that correlations between the three scores were stronger for the numerate than the nonnumerate. These studies suggest that utilities obtained from low-numerate individuals may have limited validity.
Recognizing the potential medical and psychosocial harms that may result from low health literacy and numeracy specifically, it is incumbent upon clinicians and researchers to take steps to minimize adverse effects of low numeracy on health outcomes. Perhaps the first thing clinicians need to do is realize that it is not obvious which individuals are low literate and low numerate. Appearances may be deceiving, and national statistics indicate that low numeracy is pervasive . For example, clinicians often assume that people are functionally literate because of their physical appearance, level of education, or socioeconomic status, and consequently overestimate literacy skills [47–49]. A number of studies have demonstrated that educational attainment is not necessarily a reliable indicator of health literacy [50–53]; in fact, the final grade of formal schooling completed is often considerably higher than the individual's functional literacy level . We would expect similar assumptions with respect to numeracy. Furthermore, asking patients to assess their literacy skill is often nonproductive, as data from the National Adult Literacy Survey revealed that most of the adults who performed at the lowest literacy level reported that they could read “well” or “very well” . Similarly, self-reported numeracy is not necessarily a reliable predictor of objective numerical performance . Given that low numeracy is common, may not be apparent from observable characteristics, and cannot always be reliably assessed simply by asking, we offer the following recommendations for ways to recognize and remediate low numeracy in the health care setting, based on empirical evidence.
There are currently two ways to assess adult numeracy. The first method involves evaluating people's “objective” numeracy skills—that is, their ability to perform basic arithmetic operations, to solve problems involving frequency, probability, and percentages, and to interpret health information. The second method involves assessing “subjective” numeracy skills—that is, how confident and comfortable people feel about their numerical ability.
A widely used instrument for assessing objective numerical competence is an 11-item Numeracy Scale based on three questions developed by Schwartz et al.  and expanded by Lipkus et al. . The original three-item instrument is a content-free numeracy assessment that covers basic familiarity with probability, the ability to convert a percentage to a proportion, and the ability to convert a proportion to a percentage. The instrument has moderate internal consistency, with Cronbach's alpha scores ranging from 0.57 to 0.63 in three separate studies . In two studies, higher numeracy scores were associated with greater accuracy: Schwartz et al.  reported that women with higher scores were better able to assess the benefit of screening mammography than women with lower scores, and Sheridan and colleagues  found that patients with higher scores were better able to interpret treatment benefits than patients with lower numeracy scores. The expanded instrument , which is specific to health, assesses the ability to compare risk magnitude, convert percentages to proportions, convert proportions to percentages, and convert probabilities to proportions. For example, the following question assesses understanding of risk magnitude:
__ 1 in 100, __ 1 in 1,000, __ 1 in 10
The scale is easy to score: score 0 for incorrect or missing answers and score 1 for correct answers. It has adequate internal consistency, with Cronbach's alpha scores ranging from 0.70 to 0.75 in three separate studies . There are no test–retest data available for the scale. Although the numeracy scale is relatively short, it may require up to 30 minutes to complete, making it a frustrating task for some people and calling into question its feasibility in a research or clinical context.
In contrast to the Lipkus et al. Numeracy Scale , which assesses basic arithmetic and statistical skill, several instruments measure functional health numeracy—that is, the ability to understand and act on numerical health information and thereby function effectively in the health care environment. The TOFHLA  is a two-part test consisting of a 50-item reading comprehension section composed of three prose passages and a 17-item numeracy section. It can take up to 22 minutes to administer. The numeracy items pertain to practical tasks, such as understanding labeled prescription vials, interpreting the results of a blood glucose test, and understanding clinic appointment slips and financial assistance information. The number of correct numeracy responses is multiplied by 2.941 to yield a score between 0 and 50. This score is added to the reading comprehension score to yield a final score ranging from 0 to 100. The TOFHLA has excellent split-half and internal consistency reliability (0.92 and 0.98, respectively) and demonstrates good correlation with the REALM and WRAT-R (0.84 and 0.74, respectively). A shortened version of the TOFHLA, the S-TOFHLA , contains a 36-item reading comprehension section made up of two prose passages, and four numeracy items that involve reading a label on a prescription bottle and understanding how and when to take medication, interpreting blood test results, and identifying an appointment date on a clinic appointment slip. The S-TOFHLA can be administered in 12 minutes. Each correct numeracy item is awarded 7 points and each correct reading comprehension item is assigned a score of 2 points, for a total score ranging from 0 to 100. The S-TOFHLA has good internal consistency: Cronbach's alphas for the numeracy items, prose items, and all items combined are 0.68, 0.97, and 0.98, respectively. Like the TOFHLA, the S-TOFHLA is highly correlated with the REALM (0.80). Its reliability and validity are similar to the reliability and validity of the TOFHLA . Both the TOFHLA and S-TOFHLA are available in English and Spanish.
The Medical Data Interpretation Test  is an 18-item measure of the ability to interpret medical statistics and understand concepts related to disease risk and risk reduction. Like the TOFHLA, the Medical Data Interpretation Test includes questions that reflect the type of health information that people ordinarily encounter in their daily lives. For instance, two questions pertain to assessing the benefit associated with a hypothetical drug designed to prevent strokes. Items for the test were developed based on reviews of the medical literature and input from experts in statistics, education, cognitive psychology, education, and evidence-based medicine. The Medical Data Interpretation Test has good test–retest repeatability (Pearson r=0.67) and good internal consistency (Cronbach's alpha=0.71).
Another functional health numeracy screening measure is The Newest Vital Sign . This test requires a person to answer six questions based on data contained in a nutrition label from a pint of ice cream. For example, the nutrition label states that the total carbohydrate content in a half-cup serving of ice cream is 30 g. People are asked how much ice cream they could eat if they were permitted to consume 60 g of carbohydrates. The Newest Vital Sign has good reliability and validity, for both the English version (Cronbach's alpha=0.76) and Spanish version (Cronbach's alpha=0.69), and requires only 3 minutes to administer. Like The Newest Vital Sign, the Nutrition Label Survey  is a 26-question instrument that assesses the ability to interpret actual food labels. Half of the survey involves interpreting food labels (e.g., compute the number of grams of dietary fiber in a serving of candy) and half of the survey involves selecting which of two foods has more or less of a particular ingredient (e.g., choose the product which contains fewer calories from fat: one slice of Arnold Hearty Classics 100% Whole Wheat Bread or 4 oz of Regular Mueller's Spaghetti). Although the Nutrition Label Survey was designed as a survey and not a scale, it has good internal reliability (Kuder-Richardson-20=0.87; personal communication, Russell Rothman). Other researchers have developed numeracy assessment instruments for specific diseases. For example, Estrada and colleagues [40, 58] added items specific to warfarin therapy to the three-item numeracy measure developed by Schwartz et al.  in order to assess the numeracy of patients on anticoagulation therapy. Other numeracy measures have been developed for patients with asthma  and diabetes .
The second method for gauging numeracy involves assessing one's subjective numerical ability. The Subjective Numeracy Scale (SNS) developed by Fagerlin and colleagues [60, 61] was conceived as a less stressful alternative to objective numeracy measures. Because many people find mathematical aptitude tests aversive, the SNS was designed to predict level of numeracy based on self-assessment of quantitative ability. The SNS contains four items that measure people's perception of their numerical ability and four items that pertain to preferences for numeric information presentation. An example of a question that addresses preference for display of numeric information is as follows:
The scale demonstrates good reliability (Cronbach's alpha=0.82) and is significantly correlated with the Lipkus et al. objective numeracy scale (r=0.63–0.68) [60, 61]. In a head-to-head comparison, the SNS took significantly less time to complete and was viewed as significantly less stressful and frustrating than the Lipkus et al. objective measure . Test–retest data are not available for the SNS. The major weakness of the SNS is that it measures perception of ability rather than actual ability.
The STAT-Confidence scale developed by Woloshin and colleagues  is a three-item subjective measure that assesses one's confidence in the ability to understand medical statistics. The scale consists of the following statements: (1) In general, how easy or hard do you find it to understand medical statistics?; (2) I am confident that I can make sense of medical statistics; and (3) I feel like I do not know how to interpret medical statistics. The scale has good psychometric properties (test–retest repeatability r=0.62; Cronbach's alpha=0.78). However, unlike Fagerlin et al.'s SNS, the STAT-Confidence scale showed only a weak correlation (r=0.15; p=0.04) with an objective measure of numeracy, the Medical Data Interpretation Test .
Both the objective and subjective numeracy scales have advantages and disadvantages. For example, while subjective numeracy assessments may be less threatening than objective measures that require arithmetic calculations or data interpretation, an objective numeracy assessment may be more likely to yield a truer estimate of a person's numerical competence. Any of these brief instruments could be administered to patients prior to an office visit in order to focus the clinical conversation at an appropriate level. Additionally, as patient decision aids become more commonly used in clinical practice, it will be important to assess numeracy to know whether a particular decision aid is appropriate for a given patient to use.
Even for low-numerate individuals it may be possible to make clinically relevant numbers more accessible. That is, it may be possible to improve how accurately people understand risk, while at the same time reduce cognitive burden by presenting numeric information in the most efficient and comprehensible manner. Research has shown that including risk statistics in a graphic format can improve understanding  and significantly affect decision making . Graphs and other visual displays, such as cartoons and films, are now frequently recommended and used in clinical settings as adjuncts to enhance comprehension of statistical information for purposes of risk communication and medical decision making [65, 66]. Graphic displays can be valuable in helping individuals detect data patterns (e.g., linear trends), perform rudimentary arithmetic operations (e.g., comparisons), and see whole-to-part relationships, such as those that involve conditional probabilities, ratios, and proportions—numerical concepts that are difficult for many to grasp [66–68]. As such, graphic displays can enhance the meaning—or gist—of statistical data [68–70].
Some general recommendations can be made with respect to the types of graphic displays that may be best for communicating risk information, particularly for those who are least numerate. Several excellent review papers have reported which graphic formats appear to be most effective for communicating numerical information for different clinical purposes [65, 66, 71]. For example, a line graph is typically the best choice to illustrate trends (e.g., the effectiveness of a drug over time) whereas a bar graph is most effective at showing how rates of adverse events for different medical treatments compare. Pie charts are useful for judging relative proportions . In general, people intuitively grasp the use of height in visual displays such as bar graphs and risk ladders to signify a quantity (e.g., level of risk). People seem to readily comprehend that events located higher on a risk ladder or vertical graph convey a greater quantity than events located toward the bottom [71, 73, 74]. Pictographs represented by figure icons showing the number in a population affected by some event can be used to illustrate magnitude and convey the notion of randomness.
Although using graphs to communicate risk information can improve patients’ knowledge and understanding, the precise formatting of the graph can have a tremendous impact on people's ability to interpret information. For instance, when line graphs are used to present survival and mortality rates, people often fail to adjust their risk perception to account for the time frame represented. Zikmund-Fisher and colleagues  found that people tend to perceive greater risk and larger differences in treatment effectiveness when a longer period of time (15 years) is represented than when the same mortality risk is represented over a shorter period of time (5 years). Some people have difficulty interpreting pie graphs because they are unable to determine the exact proportion being represented. When using pictographs, it is best to display figures in a systematic, rather than random, fashion. Random displays tend to decrease the precision of risk estimates because relative magnitudes are not easily judged .
And finally, when using visual displays to represent risk, it is important to consider that visual displays that only emphasize the numerator (e.g., showing only affected individuals) tend to increase risk avoidant behaviors, while those that highlight both the numerator and denominator tend to decrease risk avoidant behavior [68, 77, 78]. Therefore, when using graphs, the appropriate format should be selected in order to avoid bias and improve comprehension.
How should clinicians and health care researchers choose a format for presenting risk information? One could select the visual display that has been shown empirically to promote the greatest understanding, or one could ask the patient if he or she had a preference for a particular type of format. In a qualitative study of risk communication formats, Schapira and colleagues found that people expressed definite preferences for both the graphic format and time frame used to present risk information . Differences in preferences were noted for better-educated individuals compared to less-educated individuals, with the less-educated group preferring simpler visual displays. Whereas some people prefer bar graphs, others may prefer line graphs or pie charts. Therefore, one option might be to present an array of graphic formats and ask people which they prefer (commonly available computer programs readily convert data into different graphic formats). But what if an individual prefers a graphic format that research suggests is not the most optimal way to communicate numerical information? For example, it has been demonstrated that although people are better at interpreting two-dimensional bar graphs, they prefer three-dimensional bar graphs [80, 81]. Fagerlin and colleagues tested how well people understood five types of graphs—bar graphs, pictographs, modified pictographs, pie graphs, and modified pie graphs— and found that while bar graphs were preferred over the other formats, they consistently yielded lower scores on gist and verbatim comprehension [69, 82]. Thus, some people may prefer graphic formats that hinder their ability to interpret data accurately. Nevertheless, there may be situations in which it is advantageous to allow individuals to select their preferred format. For example, people might be more inclined to use patient decision aids if graphic formats were tailored to their preferences for numeric presentation.
Research unequivocally supports the conclusion that information presentation formats should not be based on the intuition of clinicians or researchers because doing so may not maximize comprehension of numeric information. A commonly held misconception is that there is no such thing as too much information . People reason that if some information is good, then more information must be better. But simply providing all available information does not necessarily enhance comprehension and decision making. In fact, providing more complete information without close attention to content and format can result in poorer comprehension and inferior decision making [69, 83]. This may be true even when decision makers express a preference for additional information, since people frequently fail to distinguish relevant from irrelevant information and often end up pursuing and using irrelevant information to make decisions they otherwise would not have made . Information presentation should be based on a scientific understanding of the impact of different formats on comprehension and decision making among those with differing levels of numeracy. Below, we review two additional issues that should be considered when presenting numeric information to individuals with different degrees of numeracy and suggest ways in which clinicians can present information in order to minimize the adverse effects of low numeracy.
Although both low- and high-numerate people are susceptible to framing effects, the less numerate appear to be more susceptible to these effects . “Framing” refers to how a decision problem is presented, and a framing effect occurs when a different but equivalent description of the same decision problem yields different preferences. For example, describing an operation as having a 90% survival rate is equivalent to saying that the operation has a 10% mortality rate, yet surgery seems more attractive to decision makers when presented in terms of the gain frame of survival as opposed to the loss frame of mortality. This framing effect was demonstrated in a lung cancer treatment study in which both patients and physicians found surgery preferable to radiation therapy when the choice problem was framed as the probability of survival rather than the probability of dying . In another framing study , participants were asked to rate students based on their exam scores, which were described as either percent correct (the positive frame) or percent incorrect (the negative frame). As predicted, the framing effect was more pronounced among the less-numerate participants, presumably because the more-numerate participants were able to transform numbers from one format (36% incorrect) to the normatively equivalent frame (64% correct). Research has shown that framing effects remain when numbers are eliminated (e.g., replacing “600 people die” with a verbal description, such as “some people die”) [69, 86]. Thus, it appears to be the qualitative interpretation of numbers, rather than the numbers themselves, that determines choices. (Qualitative interpretations of numbers are the default mode of processing, and are not the same as qualitative expressions of numbers using phrases, such as “often” or “infrequent.”) Taken together, these studies suggest that highly numerate people are better able to extract meaningful qualitative relations from numerical information. The clinical implication of these findings is that clinicians should not assume that wording choices are benign, especially for low-numerate individuals. Although equivalent expressions of risk (e.g., 95% of patients survive surgery vs. 5% of patients die during surgery) should yield the same decision regardless of whether information is presented in the “survival” or “mortality” frame, research has shown that low-numerate individuals often make different decisions depending upon how risk is framed. Therefore, it may be prudent for clinicians to frame important medical information in different but equivalent ways.
Low- and high-numerate individuals appear to access and use different sources of information when making decisions that have short- and long-term consequences. People are often asked to incur concrete costs in the present (e.g., take a medication, experience anxiety about a medical test) in order to reap long-term—but abstract and probabilistic—rewards later. Research in economics suggests that compared to the more numerate, less-numerate individuals tend to overweight short-term costs and benefits at the expense of long-term benefits [87, 88]. They also tend to base judgments and decisions on things they know for certain rather than on things that might happen in the future. Presumably, this is because low-numerate individuals have a poor understanding of probabilistic information and are therefore less likely to attend to this kind of information. The implication of this for health is that low-numerate patients may tend to overweigh events that occur in the near future and underweigh or ignore events that might happen in the distant future. For example, consider the short-term costs and potential long-term rewards of statin medications used to lower cholesterol. There is the present cost and inconvenience of taking medicine on a daily basis versus the long-term benefit of lowered cholesterol, which, on an individual level, may or may not help to prevent a cardiovascular event. Because lowered cholesterol does not guarantee that a person will not experience a heart attack—it simply reduces one of many risk factors for heart attack—a low-numerate individual may not understand the value of taking medication for something that is not a “sure thing.” Maintaining motivation and intention to adhere to medical advice in these kinds of situations may be difficult for low-numerate individuals.
Remarkable progress has been made in research on numeracy. Measures have been developed that correlate with medical outcomes and with psychological phenomena such as framing effects and temporal discounting, providing insights into how numeracy affects processing of quantitative information. Further insights are provided by identifying key mediators, such as math phobia, that are known to compromise quantitative performance  and other key factors that are correlated with numeracy, such as literacy, education, age (older people are more likely to be low in numeracy), and race or ethnicity (Hispanics tend to be lower in numeracy) . However, much of this research is descriptive and unmotivated by theory. In other words, these correlations do not identify causal mechanisms that lead to specific predictions other than “predicting” what has previously been found. For example, although researchers have identified graphic formats that seem to reduce certain types of errors, these effects have been obtained largely through trial and error, rather than being motivated by theory.
Nevertheless, theoretical frameworks exist that could be used to generate predictions about numeracy, and recent research has begun to exploit this possibility . There are four major theoretical approaches that are relevant to numeracy: the traditional information-processing or computational approach that stresses precision, analysis, and elaboration to achieve accuracy ; evolutionary approaches that stress natural quantitative processing as illustrated by frequency effects [91, 92]; dual-process approaches that contrast intuitive (or affective) and analytical processing in which errors are due mainly to intuitive processing; and fuzzy-trace theory, another dual-process theory, but one that stresses gist-based intuition as an advanced mode of processing and contrasts it with verbatim-based analytical processing . Space does not permit an exhaustive review of these theories regarding numeracy (but see ). We can, however, briefly outline these approaches and their implications for numeracy and point to avenues for future research.
According to the information-processing approach, working memory limitations interfere with numerical (and other) processing. Therefore, efforts to reduce the burden on working memory are predicted to improve performance. In this view, health care providers can make numbers more understandable and useable to patients by reducing the cognitive effort needed to process information—for example, by reducing the impact of irrelevant sources of information on judgments (Peters E, Dieckmann N, Vastfjall D, et al. Bringing meaning to numbers: the function of affect in choice. Unpublished manuscript). Also, processing information more precisely and more elaborately ought to improve performance because number “crunching” or computation is the ideal. Consistent with this approach, asking people to actively process information by enumerating reasons for their preferences or by indicating the size of a risk on a bar chart may enhance the use and comprehension of numbers and reduce reliance on other sources of information (e.g., anecdotes and testimonials; Mazzocco et al. How priming analytical versus emotional thinking influences choice processes. Unpublished manuscript) . Further, Mazzocco et al. (How priming analytical versus emotional thinking influences choice processes. Unpublished manuscript) found that asking decision makers to state reasons for their choices encouraged greater weighting of numerical information over less relevant, nonnumerical sources of information, such as emotion and anecdotes. Many public health programs emphasize this kind of precise and elaborate processing of numerical information. Modern dual-process theories, discussed below, have subsumed this computational approach into their assumptions about the analytical side of processing .
Evolutionary approaches suggest that there are two modes of quantitative processing: one that is evolutionarily ancient and involves frequencies  or crude quantitative distinctions  and another that is more recent and involves formal, numerical knowledge (e.g., knowledge of probabilities). Some promising work is being conducted comparing numerical processing in primates and young children to that of adult humans , but this research has not developed to the point where specific predictions can be made about numeracy. Research on so-called natural frequencies, although once quite promising, has not panned out empirically; a growing body of evidence challenges the assumption that frequencies are any easier to understand than probabilities (for reviews, see [68, 95]). Indeed, the latter claim has been tested repeatedly in the context of health decision making and been disconfirmed. Studies that initially reported a difference between frequencies and probabilities have been criticized on methodological grounds [68, 95].
Most modern theorists take a dual-process approach in order to account for conflicting patterns of performance in reasoning and decision making . Epstein  developed a series of measures of intuitive or “experiential” thinking versus analytical or rational thinking, which have been extended by Slovic and colleagues . The basic assumption of this approach is that intuitive thinking is the source of biases and errors in numerical processing, accounting for framing effects and ratio biases. Analytical thinking is believed to be the source of accurate and objective numerical processing (these dual-process claims about accuracy are central in the articles we discuss, although it should be acknowledged that intuition in these approaches is not assumed to inevitably lead to errors). Affect, defined as either mood or valence (i.e., good–bad), is thought to be an aspect of intuition. According to Peters and colleagues, high- and low-numerate people appear to access different sources of information when making decisions. The low numerate, who are less likely to attend to and understand numbers, appear to be informed more by nonnumeric, sometimes extraneous, sources of information, such as mood or affect. This effect was demonstrated in a study that examined how people made judgments about hospital quality. In this study, a community sample of 152 adults evaluated hypothetical hospital data. Low-numerate subjects were more likely to base their rating of hospital quality on their mood rather than numerical quality indicators, while high-numerate subjects showed no effect of mood; rather, the high numerate were able to use a numerical quality indicator to rate the hospitals (Peters E, Dieckmann N, Vastfjall D, et al. Bringing meaning to numbers: the function of affect in choice. Unpublished manuscript). In another experiment, intuition was pitted against rational analysis by making an objectively worse choice more tempting . Subjects were offered a prize if they drew a colored jellybean from a bowl. Bowl A contained nine colored and 91 white beans, and Bowl B contained one colored and nine white beans. Consequently, the chance of picking a colored jellybean was objectively better if you picked from Bowl B (10% chance of winning) than if you picked from Bowl A (9% chance of winning). Despite this, 33% of low-numerate subjects and 5% of high-numerate subjects picked from Bowl A, which was clearly not the rational choice if one wanted to win a prize. High-numerate subjects tended to select Bowl B because they perceived the probability of winning more clearly than low-numerate subjects. The less numerate were influenced more by the number of winning beans than by the number of losing jelly beans (a ratio-bias effect). Although dual-process or rational-affective theories motivated this experiment, it is unclear whether affect per se or alternative mechanisms involving class relations  is the source of these differences in processing . Nevertheless, these studies on mood and ratio bias suggest that compared to high-numerate patients, the preferences expressed by low-numerate patients are likely to be more labile and influenced by extraneous cues, and less influenced by objective probabilities.
Finally, fuzzy-trace theory was developed to explain challenges to several of the assumptions discussed above, notably that working memory capacity has been found to be unrelated to reasoning performance in most paradigms (for a review, see ). Fuzzy-trace theory explains framing effects [68, 86] as a function of a fuzzy-processing preference; that is, people make decisions on the basis of bottom-line gist. Moreover, gist-based processing increases with experience and knowledge. Thus, framing effects were predicted and found to increase from childhood to adulthood ; other heuristics and biases show a similar, counterintuitive trend (see Table 3 in ). In adulthood, experts have been found to base their decisions more on simple gist, compared to novices with less experience and knowledge .
Numerical processing in particular has been a longstanding focus of research on fuzzy-trace theory [76, 99]. For example, the ratio-bias effect (preferring options with smaller probabilities but larger numerators) described earlier is an example of a class-inclusion illusion that involves retrieval and processing assumptions as well as representational assumptions . Any ratio concept, including probability, is inherently confusing because the referents of classes overlap. People focus on the target classes in numerators (e.g., the nine colored jelly beans in Bowl A) and neglect the classes in the denominator (e.g., the 100 total jelly beans in Bowl A), producing the ratio-bias effect (also called the numerosity effect) . Just as reminders about nonequivalence across frames can reduce framing effects, reminders about denominators can reduce the ratio-bias effect. Class-inclusion errors persist for advanced reasoners. For example, physicians and highschool students performed equally poorly on a base-rate neglect problem, which is another type of class-inclusion problem. In this study, students and physicians were asked to estimate the probability of disease for a patient with a positive test result, given the base rate of disease and accuracy of the test . In sum, although traditional dual-process approaches generally equate accuracy with analysis, fuzzy-trace theory predicts that more experienced, knowledgeable reasoners (e.g., the more highly numerate) should rely mainly on the simple, qualitative gist of numbers but should not be less subject to class-inclusion illusions. Indeed, enumerating reasons for choices and focusing on fine-grained numerical distinctions as opposed to stark qualitative contrasts or gist (e.g., more patients die in Hospital A than Hospital B) leads to lower levels of performance in reasoning and decision making [69, 102]. Going with your gut—or gist—may be especially beneficial for older people whose verbatim memories are less robust: Age differences in choice quality between younger and older adults are reduced when decisions are based on gut feelings (Mikels JA, Loeckenhoff C, Maglio S, et al. Going with your gut may pay off for older people: age differences in choice quality are reduced when decisions are based on feelings. Unpublished manuscript). Recently, findings on affect have been integrated with the principles of fuzzy-trace theory; such findings indicate that affect is frequently an essential part of the gist experience. Although extant research on numeracy is consistent with fuzzy-trace theory, research has not critically compared predictors of alternate dual-process approaches .
We suggest that future research in health numeracy address conceptual, measurement, and communication issues. At a conceptual level, there is a need for a consensus regarding a definition of health numeracy that is empirically and theoretically derived. By empirically, we mean a construct whose components are justified by empirical evidence, such as results of psychometric analysis. By theoretically, we mean a comprehensive theoretical account of health numeracy that can be used to make specific predictions.
A theoretical framework is also needed in order to develop measures that fully operationalize the construct of health numeracy. Presently, the most widely used measures to assess health numeracy tap different dimensions of numeracy. For example, the TOFHLA, which is perhaps the most widely used of all the health numeracy instruments, does not test basic arithmetic computation skills but rather simple, functional applied skills (such as understanding directions for taking medication). The TOFHLA differs considerably from the often cited Schwartz et al. general numeracy assessment  and the Lipkus et al. numeracy scale , both of which primarily test arithmetic computation skills, such as converting a percentage to a proportion and converting a proportion to a percentage. The Medical Data Interpretation Test , on the other hand, does not focus on arithmetic skills per se, but rather on interpretation and deriving higher level inferences from numerical information. A novel approach to assessing health numeracy—the Numeracy Understanding in Medicine Instrument—is being developed by Schapira and colleagues . Based on an empirically derived health numeracy framework, Schapira is using Item Response Theory to develop a numeracy measure that will be cross-culturally equivalent across racial and ethnic groups. One of the advantages of this technique is that the final instrument will be appropriate for individuals at all skill levels, thereby making it potentially usable for screening (in the clinical setting) or research purposes. The instrument will also be able to identify an individual's strengths and weaknesses, which will in turn allow health care providers to tailor communication and intervention strategies. The need for a more comprehensive measure of health numeracy was underscored in a study by Donelle and colleagues , which elicited different levels of performance using different numeracy assessments.
It is also important to examine whether subjective numeracy measures, such as the SNS  and the STAT-Confidence scale , tap the same underlying dimensions as measures of objective numeracy. Limited and conflicting data make it difficult to evaluate how well subjective measures correlate with objective measures. For example, in one study involving 357 patients attending a university internal medicine clinic, 70% of subjects reported that they considered themselves “good with numbers,” yet only 2% of subjects answered all three items on the Schwartz et al. objective numeracy measure correctly . In contrast, Fagerlin and colleagues reported that the SNS was significantly correlated with the Lipkus et al. objective numeracy scale (r=0.63–0.68) . Clearly, further testing of subjective numeracy scales is needed. If these scales are found to be poor measures of numeracy, future research will need to develop and validate new subjective measures or refine existing measures that could be used in the clinical setting. This would enable providers to screen for numeracy in a minimally burdensome way. Although screening for numeracy in a clinical setting is not currently practised, health literacy screening in a primary care setting has been found to be acceptable to physicians .
Much work needs to be done in the area of communicating risk through visual and verbal means. As described in this paper, a number of researchers have examined the effects of graphic data displays on understanding health risks. Excellent reviews of this literature have been provided by Lipkus and Hollands  and Ancker and colleagues , and recommended practices have been proposed by Lipkus . Researchers in the health domain have studied how best to convey quantitative health risks by examining how manipulating graphic formats can improve accuracy, reduce response time, affect risk perception and behavior, and affect perceived credibility of the data [63, 70, 71, 78, 110–112]. However, research in this area has been largely atheoretical . Future work should focus on developing theory-based models of the perceptual and cognitive processes involved in understanding graphic displays of health risk. For example, such work in nonhealth domains [113, 114] has been informed by theories from cognitive psychology, educational psychology, and gestalt psychology, but there has not been a concerted effort to coordinate and apply contemporary theories to the health domain. It will also be important to examine the mechanisms by which level of numeracy interacts with graphic format to affect comprehension.
Clear communication between patients and physicians, and among medical professionals, regarding the potential risks and benefits of treatment is essential for informed decision making. However, whether it is most effective to communicate risk using numbers (e.g., a 0.001% chance of experiencing a side effect), qualitative expressions of likelihood (e.g., a rare side effect), or both modes is an important research question . It has been suggested that qualitative expressions of probability may be preferred because they seem to be natural and easy to use, while quantitative expressions of probability may be preferred because they are more precise . A number of studies have demonstrated that there is considerable variation in how patients and physicians interpret verbal probability expressions [117–119]. Patients and the lay public differ as to their preference for quantitative or qualitative risk communication [120–123]. In one study, physicians preferred to use verbal expressions of probability, while laypersons found numbers more helpful . Some researchers have suggested that qualitative expressions of probability could be systematically codified for use in the clinical encounter [125, 126], while others are emphatic that they should be eliminated entirely in the medical context . Mapes  observed that physicians’ interpretations of probability expressions depended on the therapeutic context. In his study, a “rare” side effect was judged to be much more severe when it was associated with a beta-blocker than when it was associated with an antihistamine. Whether one mode of communicating risk leads to more effective communication remains an empirical question. In addition, it will be important to examine how interpretation of verbal and numeric expressions of probability is influenced by level of numeracy, context, and experience.
Low numeracy is pervasive and constrains informed patient choice, reduces medication compliance, limits access to treatments, impairs risk communication, and affects medical outcomes. Numeracy explains unique variance in medical decision making beyond that explained by such factors as education or intelligence, and cannot be reliably inferred by observable patient characteristics. Well-validated objective numeracy measures provide the most accurate assessment of basic numerical skill. Subjective self-report measures, which are correlated with objective numeracy measures, may be useful tools in the clinical context because patients find them less burdensome and less intimidating than objective measures. Graphic displays are of general benefit, depending on the relation to be conveyed (e.g., lines to convey trends, heights of bar graphs to convey magnitude), and these visual displays may be especially helpful to those who are least numerate. Making numbers more transparent should offset the use of irrelevant cues, such as mood, in medical decision making, which are relied on more by the low numerate. The low numerate are also more subject to sharper temporal discounting—weighing immediate rewards much more than temporally distant rewards—presumably because they are less able to understand probabilistic risk. Because prevention of the leading causes of death (e.g., cancer and cardiovascular disease) depends in large part on taking action now to prevent serious consequences later, the low-numerate patient is likely to require more extensive explanations of risk in order to engage in prevention behaviors. Although the highly numerate have been less subject to some biases (such as preferring lower probability options with larger, enticing numerators), they may achieve this superior numerical processing by, ironically, focusing on qualitative gist (rather than quantitative details). The role of affect in guiding intuitions, even among the highly numerate, is a promising avenue for future research, and one that clinicians may someday use to empower patients to take advantage of the evidence in evidence-based medicine.
This work was based on a National Cancer Institute-sponsored symposium—Numeracy: A Critical (and Often Overlooked) Competence for Health Decision Making—presented at the Society of Behavioral Medicine Annual Meeting, Washington, D.C., USA, March 22, 2007. Dr. Reyna is supported by grants from the National Cancer Institute (R13CA126359) and the National Institute of Mental Health (MH-061211). Dr. Fagerlin is supported by an MREP early career award from the US Department of Veterans Affairs. Dr. Lipkus is supported by The Foundation for Informed Medical Decision Making. Dr. Peters is supported by a grant from the National Science Foundation (SES-0517770). We thank Nathan Dieckmann for his helpful literature review.
Wendy Nelson, Basic and Biobehavioral Research Branch, DCCPS, National Cancer Institute, 6130 Executive Blvd., Bethesda, MD 20892, USA ; Email: vog.hin.liam@wnoslen..
Valerie F. Reyna, Cornell University, Ithaca, NY, USA. National Cancer Institute, Bethesda, MD, USA.
Angela Fagerlin, Ann Arbor VA HSR and D Center for Excellence Division of General Internal Medicine, University of Michigan, Ann Arbor, MI, USA. Center for Behavioral and Decision Sciences in Medicine, University of Michigan, Ann Arbor, MI, USA.
Isaac Lipkus, Duke University, Durham, NC, USA.
Ellen Peters, Decision Research, Eugene, OR, USA. University of Oregon, Eugene, OR, USA.