Both the IOM’s report5
and Healthy People 201027
conclude that improving HL is a national priority and may be critical for reducing disease and health disparities. Having valid and reliable HL assessments is vital for understanding and reducing the negative effects of poor HL. The objective of this study was to compare prevalence estimates derived from the two most common HL assessments and determine the effect of non-response on those estimates. While other measures of HL exist,20–22,28
the REALM and S-TOFHLA are the most commonly used in clinical and community settings, yet few studies have compared them7,8
. In the literature, however, they are often considered equivalent. A recent systematic review12
, for example, reports similar prevalence rates between the set of studies using either the TOFHLA or S-TOFHLA and those using REALM. Our study, which directly compares the two, brings this into question.
We draw three important conclusions from our results. First, the prevalence of low HL varies by the assessment used. This finding has important implications for health systems and researchers as they take up the IOM recommendations5
to conduct HL assessments locally and nationally in order to determine the magnitude of poor HL, monitor how it changes over time and find innovative ways to improve it. In the two prior studies directly comparing the REALM and S-TOFHLA,7,8
estimates were strongly correlated (r
0.80), and agreement was strongest for those with the highest and lowest skills, but differed significantly in the middle ranges of the tests. In our study, nearly two times as many were categorized with inadequate skills using S-TOFHLA than with REALM and three times as many were categorized with marginal (7–8th grade) by REALM than with S-TOFHLA. Differences across assessments could indicate that one assessment is less accurate than the other, especially for certain thresholds, or that parameters are less stable across different demographic groups. Another explanation, however, is that each instrument measures different components of individual capacity for understanding health-related information and, thus, they are not comparable instruments. Baker has suggested that print literacy is related to two constructs: reading fluency (prose, quantitative and document fluency) and prior knowledge (vocabulary and conceptual knowledge of health and health care).29
It is possible that the S-TOFHLA measures reading fluency more accurately, whereas the REALM measures prior knowledge more accurately. Understanding the conceptual differences in these assessments may be helpful to researchers and practitioners who are trying to determine which assessment(s) is most appropriate for measuring an intervention’s progress. Because the correlation between the two assessments is strong, any program designed to improve health literacy as measured by one assessment tool would likely show benefits in the other. However, existing programs may want to use both assessments in the development and evaluation of HL interventions in order to assure that both reading fluency and knowledge skills are developed, thereby maximizing the impact on overall HL skills.
Our second conclusion is that non-response bias affects prevalence estimates and that estimates based on REALM are more affected by response bias than S-TOFHLA. Most HL studies to date have relied on convenience samples, which can compromise the validity of findings by inflating estimates of poor HL. Random sampling can help to improve the accuracy of estimates in large populations, but because disenfranchised or stigmatized groups are often less likely to participate in research13,16,30–33
, these estimates may also be biased. Two prior HL studies, one a large study of Medicare enrollees in a national managed care organization that used the S-TOFHLA11
and the other, the National Assessment of Adult Literacy (NAAL), which used an instrument developed specifically for that study,22
have assessed non-response bias. Both used census-tract data and found that nonresponders were not more likely to come from disenfranchised areas or areas with high concentrations of people at risk for poor HL, but instead from areas with higher incomes and educational attainment, and less concentration of blacks or minority race. While the NAAL data were weighted to adjust for non-response bias, the Medicare enrollee study was not, and thus, estimates for poor HL may have been deflated due to bias. These results and our own findings suggest that non-response bias is an important consideration for all HL studies. Well-designed qualitative studies may be particularly helpful for researchers to assess the validity of the assumptions used in our non-response analyses, and further quantitative studies using randomized samples of patients are needed to understand how bias differs by assessment and demographic sub-groups.
Finally, the prevalence of the poorest HL skills in this study is lower than in other large studies, although a notable proportion of participants (4–9%) still have only the most rudimentary skills. Our estimates of poor HL differ from a convenience sample of other primary care veteran patients, where 10–15% had skills at ≤6th grade34
, but similar to other veteran samples, including those undergoing cancer screening (where 36% had skills at ≤8th grade)35
and those in a preoperative clinic (4.5% had inadequate; 7.5% had marginal skills).20
Our estimates of poor HL were also lower than those in studies of non-veterans. In the NAAL, the most comprehensive, nationally representative study of English-language literacy and HL in the US, 14% had below basic skills,22
and in a recent systematic review of HL studies, 26% had inadequate skills.12
A number of factors could explain why our estimates of poor HL are lower than those from prior studies. First, veterans using VHA may have different health care experiences, including patient education, than non-veterans. Second, variation may reflect the differences across studies, including the assessment tools used. The NAAL, for example, did not use the REALM or S-TOFHLA, making comparisons difficult. Third, study methodology may lead to difference in estimates. Many prior studies based on smaller, single site convenience samples. Fourth, population characteristics may differ across studies. Veteran’s level of English fluency and cognitive capacity to function in English, for example, may be higher than non-veterans. Although we did not assess whether English was their primary language as part of this study, veterans are required to demonstrate the ability to speak and write English at a functional level as a requirement for entering the armed forces, and are screened for physical or mental conditions that might inhibit their ability to serve.36,37
Education is also strongly linked to literacy22,38
, and veterans in this sample have relatively high levels of education compared to other studies. In the systematic review of HL studies, for example, the majority of studies reported 35–55% of their samples having less than a high school education, while we found just under 10%.12
Variation in prevalence estimates, therefore, may be due to both methodological and population differences. Future research efforts, especially analyses of the NAAL data that include veteran status (but not specifically veterans using VHA), could be used to further assess demographic differences in HL and non-response bias.
Our findings are tempered by a number of limitations. Accepted definitions of HL emphasize the dynamic communication and negotiation strategy between an individual and the health care environment. Our assessments, however, were conducted privately, free of interruptions or distractions, using assessments with a narrower focus than the broader, accepted definition. These estimates, therefore, may underestimate how well people function in ordinary health encounters, since encounters often include complex information in different formats (i.e., written, oral, non-verbal) and distractions that can inhibit comprehension and recall. Related to this, our overall response rate was low compared to other studies that used convenience sampling, but relatively consistent with studies that used random sampling. Possible bias due to non-response, however, was addressed. It is also important to note that our study and the literature cited were conducted in the US and our findings may not generalize to other English-speaking countries where different factors may affect HL and non-response. Finally, in spite of state-of-the-art imputation methods, imputation is never perfect, and therefore, estimates adjusted for bias may actually underestimate the true prevalence of poor HL.