|Home | About | Journals | Submit | Contact Us | Français|
This paper compares estimates of poor health literacy using two widely used assessment tools and assesses the effect of non-response on these estimates.
A total of 4,868 veterans receiving care at four VA medical facilities between 2004 and 2005 were stratified by age and facility and randomly selected for recruitment. Interviewers collected demographic information and conducted assessments of health literacy (both REALM and S-TOFHLA) from 1,796 participants. Prevalence estimates for each assessment were computed. Non-respondents received a brief proxy questionnaire with demographic and self-report literacy questions to assess non-response bias. Available administrative data for non-participants were also used to assess non-response bias.
Among the 1,796 patients assessed using the S-TOFHLA, 8% had inadequate and 7% had marginal skills. For the REALM, 4% were categorized with 6th grade skills and 17% with 7–8th grade skills. Adjusting for non-response bias increased the S-TOFHLA prevalence estimates for inadequate and marginal skills to 9.3% and 11.8%, respectively, and the REALM estimates for≤6th and 7–8th grade skills to 5.4% and 33.8%, respectively.
Estimates of poor health literacy varied by the assessment used, especially after adjusting for non-response bias. Researchers and clinicians should consider the possible limitations of each assessment when considering the most suitable tool for their purposes.
Health literacy (HL) is commonly defined as “the degree to which individuals have the capacity to obtain, process and understand basic health information and services needed to make appropriate health decisions.”1 Poor HL is now considered a risk factor for ill health and mortality.2–6 The Institute of Medicine (IOM), in “A Prescription to End Confusion,”5 recommends widespread assessment of HL in order to monitor and reduce the negative health effects of poor HL.
To date, the most commonly used HL assessments measure the individual capacity for either reading fluency or recognition of medical vocabulary. The Test of Functional Health Literacy (TOFHLA)7 and its short form, the S-TOFHLA,8 are common HL assessments used in research and clinical practice to assess reading fluency of health materials. The Rapid Estimate of Adult Literacy in Medicine (REALM),9 the most widely used assessment, tests for recognition of medical vocabulary. Few studies have compared prevalence estimates based on these measures,7 leaving researchers, clinicians and health educators with little data to evaluate which instrument best fits their needs. Moreover, with some notable exceptions,10,11 most HL studies using these assessments have relied on convenience sampling; examined clinically or geographically specific populations, often in locations or sites where evidence suggests rates of poor HL would be high12; and, in spite of the stigma attached to poor literacy,13 have not examined the effect of non-response on findings. In addition to inhibiting the generalizability of study conclusions, these methodological limitations may mask biases these assessments have, especially across sub-groups.
In this study of US veterans, we administered the two common HL assessments: the REALM9 and the S-TOFHLA8 in order to determine if estimates varied by measurement approach. We computed overall estimates of inadequate, marginal and adequate HL for each measure and, using administrative and survey data, evaluated whether non-response biased our findings.
Face-to-face interviews were conducted with 1,796 US veterans who receive primary care services either at the Minneapolis, West Los Angeles, Durham, or Portland VHA Medical Centers. The Institutional Review Board at each site approved the study protocol. The four study sites were deliberately chosen because they had a high volume of primary care patients and provided demographic and regional variation.
Study population Eligible patients included those who were scheduled to have at least one primary care visit during the study’s 12 month recruitment period and did not suffer from a severe cognitive disorder (e.g., dementia, schizophrenia), as determined by medical records review and cognitive impairment screening test administered prior to the interview. Since the HL assessments require individuals to read written material, blind patients or those with severely impaired vision were also excluded.Studies have shown that age is one of the strongest correlates of HL.5,14 To ensure enough variability to detect differences in HL, eligible patients were stratified by age (<50, 50–75, >75) and facility. In total, 4,868 eligible patients were randomly selected for recruitment.Recruitment. Invitations were mailed to randomly selected patients at each site. Ten days later, study recruiters telephoned each potential participant to invite them to participate. Up to six contact attempts were made at different times of day. Patients willing to participate scheduled a 1-hour research appointment and received $25 at the interview for participating.
Study sample Of the 4,868 veterans selected for recruitment, we were unable to contact 21% (Fig. 1). Of the 3,850 we were able to contact, 23% refused to participate because of scheduling or transportation difficulties, 9% for other reasons, and 4% were ineligible. Of those contacted, 64% agreed to an interview. Of those, 9% did not show up for their appointment or could not be rescheduled after missing their interview; 8% showed up, but did not complete the interview due to ineligibility or participant decision not to continue. We completed interviews with 53% (n=1,796) of the eligible participants.
Non-response sample To assess non-response bias, demographic characteristics (age, marital status, and self-reported race/ethnicity) and history of comorbid conditions were extracted from administrative and medical record data for all 4,384 randomly selected for recruitment and not found to be ineligible. Non-respondents included all eligible patients who could not be reached (n=1,018), did not attend their scheduled research appointment (n=325), initiated but did not finish the interview (n=22), “soft refusers” (e.g., those who reported scheduling or transportation difficulties or were too busy; n=1,223), and “hard refusers” (e.g., those not interested in the research). All non-responders except for hard refusers were also mailed a proxy survey packet, described below.
Face-to-face interviews Prior to administering each survey, interviewers screened patients for cognitive and visual impairments to verify eligibility. Cognitive impairment was assessed using the Mini-Cog, a validated brief screening test for dementia.15 Those testing positive for dementia (Mini-Cog score <3) were ineligible. Visual acuity was assessed using standard vision charts. Patients with corrected visual acuity of 20/100 or worse were ineligible because poor eyesight might confound the HL assessments. After visual and cognitive screening, interviewers administered the study’s survey to all eligible participants.
Non-response proxy survey Surveys were sent after all face-to-face interviews were completed. The survey packet included a small cash incentive, cover letter, self-addressed stamped envelope and a ten-item questionnaire containing four self-report HL questions (described below) and six demographic questions (marital status, race, ethnicity, education, employment and income).16 Reminder postcards were mailed 1 week after the first mailing. A second questionnaire was mailed to anyone who did not return a blank or completed survey within 3–4 weeks of the first mailing. A total of 1,435 (64%) of the 2,237 participants who did not complete a face-to-face interview completed this proxy questionnaire.
Health literacy assessments The REALM and S-TOFHLA were used to assess HL skills. The REALM measures HL by assessing the correct pronunciation of 66 common medical terms. It is strongly correlated with other standardized reading assessments and has excellent intra-subject reliability.9,17,18 Scores are most often categorized into grade levels (≤3rd grade, 4–6th grade, 7–8th grade, high school). Given the distribution of scores and their correlation to S-TOFHLA scores, we recoded REALM scores into these same categories: ≤6th grade (0–44 words pronounced correctly), 7–8th grade (45–60) and high school (61–66).The S-TOFHLA includes 4 numeracy items followed by 36 reading comprehension items. The reading comprehension passages have good internal consistency.7,8 To assess numeracy, participants were given props and asked questions that require them to make numerical calculations and interpret meanings from test results and recognize an appointment time. The reading comprehension assessment includes two reading passages of varying difficulty, with every third to fourth word of the passage removed and replaced with a list of four possible words from which to choose to complete the sentence. Participants are timed and given a total of 7 min (the standard protocol for the S-TOFHLA)19 to complete the assessment. Total scores range from 0–100 and are then categorized into inadequate, marginal and adequate HL skills using established cutoff scores (inadequate, 0–53; marginal, 54–66; adequate, 67–100),19 making categories comparable to those used for REALM. During the interview, the REALM was administered first, followed by the numeracy and then the timed reading comprehension sections of the S-TOFHLA.Four self-report HL questions, shown to be adequately sensitive and specific in predicting HL by level of REALM and S-TOFHLA, were also administered during the interviews, preceding the administration of the REALM and S-TOFHLA. These same questions were also included in the proxy survey. Three questions were adapted from items developed by Chew.20 We added another question: “How often do you have problems learning about your medical condition?” Responses to the self-report HL questions were scored on a 5-point Likert scale. These questions have been shown to be adequately sensitive and specific in predicting HL by level of REALM and S-TOFHLA.21
Covariates Prior studies have described a common set of factors associated with both inadequate literacy and HL skills, most of which are tied to poverty status.5,22 Therefore, demographic, socioeconomic and health characteristics were collected during the interview and from administrative data files. Because a high burden of morbidity could confound estimates of adequate HL, chronic disease and mental health history for each patient was also extracted from medical records. Chronic diseases were summarized using a Charlson comorbidity score.23 Mental health diagnoses were categorized into three groups: (1) no mental health diagnoses, (2) single psychiatric or substance abuse related diagnosis, or (3) dual diagnosis (psychiatric and substance abuse).
Constructing estimates of health literacy To account for our complex sampling design, we used stratified weighted estimation methods to estimate HL levels.24 For each assessment, stratified estimates were calculated using all participants who completed the test during interviews.
Adjusting for non-response bias Multiple imputation (MI) was used to adjust for non-response bias. The MI procedure in SAS 9.1 and a logistic regression model were used to compute and assign a probability to each of the possible values that could be assumed by a variable whose value is missing for any given case. The probabilities assigned to the replacement values depended on the values assumed by all other covariates in the model for the case.25 A number randomly chosen between 0 and 1 from a uniform probability distribution was compared to the assigned probabilities to select the imputed value. Covariates in the model included administrative data (e.g., age and gender) and self-report HL items. Imputation was done iteratively to maintain a monotone missing pattern. The initial covariates with no missing values were used to impute replacement values for the item with some but fewer missing values than any of the remaining items. This item was then added to the set of covariates in the logistic model, which was used to impute values for the item with the second fewest number of missing values, and so on. The missing S-TOFHLA and REALM scores for non-respondents were imputed last using a linear regression model with all the initial and imputed items as covariate predictors. Once the missing S-TOFHLA and REALM scores had been replaced, they were classified as inadequate, marginal or adequate.This imputation process was repeated five times, creating five different but complete data sets. The prevalence of each HL level was computed for each of the five data sets, and these results were combined to create one overall set of non-response-adjusted HL prevalence estimates, accounting for data variability due to imputation and sampling.26
Sample characteristics Characteristics of the study sample are described in Table 1. Compared to interview participants, proxy survey responders and non-responders had lower levels of education and household income. Non-responders were also more likely to be younger, never married, African American, live in urban areas and have a mental health diagnosis than interview participants.
Prevalence of HL The variation across groups in Table 1 suggests our estimates might be influenced by response bias; therefore, unadjusted and adjusted estimates (corrected for response bias) are shown in Table 2. Because we had survey data for proxy survey responders, but not non-responders, proxy survey responders had more observed data to use for imputation and substantially fewer missing values for variable items typically associated with adequate HL. Proxy survey responders, therefore, had relatively more accurate imputed S-TOFHLA and REALM scores, and more accurate adjusted prevalence estimates. Among the 1,789 respondents who completed the REALM, 3.9% scored at or below the 6th grade reading level and 17.3% scored at 7th to 8th grade levels (Table 2). Using just proxy survey responders to adjust for bias, estimates increased to 5.2% for≤6th grade and 30.5% for 7th to 8th grade. After adjusting for all non-respondents (proxy and non-responders), estimates increased to 5.4% and 33.8%, respectively. Among patients who completed the S-TOFHLA, 8.1% were classified as having inadequate HL, and 7.4% had marginal skills (Table 2). These estimates increased when proxy survey data were used to adjust for bias: 9.1% were classified as having inadequate skills, and 10.7% had marginal skills. Further adjustment using all non-responders further increased estimates, to 9.3% and 11.8%, respectively.
Both the IOM’s report5 and Healthy People 201027 conclude that improving HL is a national priority and may be critical for reducing disease and health disparities. Having valid and reliable HL assessments is vital for understanding and reducing the negative effects of poor HL. The objective of this study was to compare prevalence estimates derived from the two most common HL assessments and determine the effect of non-response on those estimates. While other measures of HL exist,20–22,28 the REALM and S-TOFHLA are the most commonly used in clinical and community settings, yet few studies have compared them7,8. In the literature, however, they are often considered equivalent. A recent systematic review12, for example, reports similar prevalence rates between the set of studies using either the TOFHLA or S-TOFHLA and those using REALM. Our study, which directly compares the two, brings this into question.
We draw three important conclusions from our results. First, the prevalence of low HL varies by the assessment used. This finding has important implications for health systems and researchers as they take up the IOM recommendations5 to conduct HL assessments locally and nationally in order to determine the magnitude of poor HL, monitor how it changes over time and find innovative ways to improve it. In the two prior studies directly comparing the REALM and S-TOFHLA,7,8 estimates were strongly correlated (r=0.80), and agreement was strongest for those with the highest and lowest skills, but differed significantly in the middle ranges of the tests. In our study, nearly two times as many were categorized with inadequate skills using S-TOFHLA than with REALM and three times as many were categorized with marginal (7–8th grade) by REALM than with S-TOFHLA. Differences across assessments could indicate that one assessment is less accurate than the other, especially for certain thresholds, or that parameters are less stable across different demographic groups. Another explanation, however, is that each instrument measures different components of individual capacity for understanding health-related information and, thus, they are not comparable instruments. Baker has suggested that print literacy is related to two constructs: reading fluency (prose, quantitative and document fluency) and prior knowledge (vocabulary and conceptual knowledge of health and health care).29 It is possible that the S-TOFHLA measures reading fluency more accurately, whereas the REALM measures prior knowledge more accurately. Understanding the conceptual differences in these assessments may be helpful to researchers and practitioners who are trying to determine which assessment(s) is most appropriate for measuring an intervention’s progress. Because the correlation between the two assessments is strong, any program designed to improve health literacy as measured by one assessment tool would likely show benefits in the other. However, existing programs may want to use both assessments in the development and evaluation of HL interventions in order to assure that both reading fluency and knowledge skills are developed, thereby maximizing the impact on overall HL skills.
Our second conclusion is that non-response bias affects prevalence estimates and that estimates based on REALM are more affected by response bias than S-TOFHLA. Most HL studies to date have relied on convenience samples, which can compromise the validity of findings by inflating estimates of poor HL. Random sampling can help to improve the accuracy of estimates in large populations, but because disenfranchised or stigmatized groups are often less likely to participate in research13,16,30–33, these estimates may also be biased. Two prior HL studies, one a large study of Medicare enrollees in a national managed care organization that used the S-TOFHLA11 and the other, the National Assessment of Adult Literacy (NAAL), which used an instrument developed specifically for that study,22 have assessed non-response bias. Both used census-tract data and found that nonresponders were not more likely to come from disenfranchised areas or areas with high concentrations of people at risk for poor HL, but instead from areas with higher incomes and educational attainment, and less concentration of blacks or minority race. While the NAAL data were weighted to adjust for non-response bias, the Medicare enrollee study was not, and thus, estimates for poor HL may have been deflated due to bias. These results and our own findings suggest that non-response bias is an important consideration for all HL studies. Well-designed qualitative studies may be particularly helpful for researchers to assess the validity of the assumptions used in our non-response analyses, and further quantitative studies using randomized samples of patients are needed to understand how bias differs by assessment and demographic sub-groups.
Finally, the prevalence of the poorest HL skills in this study is lower than in other large studies, although a notable proportion of participants (4–9%) still have only the most rudimentary skills. Our estimates of poor HL differ from a convenience sample of other primary care veteran patients, where 10–15% had skills at ≤6th grade34, but similar to other veteran samples, including those undergoing cancer screening (where 36% had skills at ≤8th grade)35 and those in a preoperative clinic (4.5% had inadequate; 7.5% had marginal skills).20 Our estimates of poor HL were also lower than those in studies of non-veterans. In the NAAL, the most comprehensive, nationally representative study of English-language literacy and HL in the US, 14% had below basic skills,22 and in a recent systematic review of HL studies, 26% had inadequate skills.12 A number of factors could explain why our estimates of poor HL are lower than those from prior studies. First, veterans using VHA may have different health care experiences, including patient education, than non-veterans. Second, variation may reflect the differences across studies, including the assessment tools used. The NAAL, for example, did not use the REALM or S-TOFHLA, making comparisons difficult. Third, study methodology may lead to difference in estimates. Many prior studies based on smaller, single site convenience samples. Fourth, population characteristics may differ across studies. Veteran’s level of English fluency and cognitive capacity to function in English, for example, may be higher than non-veterans. Although we did not assess whether English was their primary language as part of this study, veterans are required to demonstrate the ability to speak and write English at a functional level as a requirement for entering the armed forces, and are screened for physical or mental conditions that might inhibit their ability to serve.36,37 Education is also strongly linked to literacy22,38, and veterans in this sample have relatively high levels of education compared to other studies. In the systematic review of HL studies, for example, the majority of studies reported 35–55% of their samples having less than a high school education, while we found just under 10%.12 Variation in prevalence estimates, therefore, may be due to both methodological and population differences. Future research efforts, especially analyses of the NAAL data that include veteran status (but not specifically veterans using VHA), could be used to further assess demographic differences in HL and non-response bias.
Our findings are tempered by a number of limitations. Accepted definitions of HL emphasize the dynamic communication and negotiation strategy between an individual and the health care environment. Our assessments, however, were conducted privately, free of interruptions or distractions, using assessments with a narrower focus than the broader, accepted definition. These estimates, therefore, may underestimate how well people function in ordinary health encounters, since encounters often include complex information in different formats (i.e., written, oral, non-verbal) and distractions that can inhibit comprehension and recall. Related to this, our overall response rate was low compared to other studies that used convenience sampling, but relatively consistent with studies that used random sampling. Possible bias due to non-response, however, was addressed. It is also important to note that our study and the literature cited were conducted in the US and our findings may not generalize to other English-speaking countries where different factors may affect HL and non-response. Finally, in spite of state-of-the-art imputation methods, imputation is never perfect, and therefore, estimates adjusted for bias may actually underestimate the true prevalence of poor HL.
Estimates of poor health literacy based on the REALM and S-TOFHLA differ, especially among those with the equivalent of marginal skills and after adjusting for non-response bias. Researchers and clinicians who direct health systems efforts to improve HL are advised to consider the possible limitations of each assessment when considering the most suitable tool for their purposes.
This research was supported by the Department of Veterans Affairs, including a grant from VA Health Services Research and Development Service (CRI-03-151-1). Dr. Griffin also received support as a VA Merit Review Entry Program (MREP) awardee. Dr. Gralnek was supported by a VA HSR&D Advanced Research Career Development Award and IIS 01-191-1. The views expressed in this article are those of the author(s) and do not necessarily represent the views of the Department of Veterans Affairs or the United States Government. Part of this work was presented at the 24th Annual VA Health Services Research Meeting, Crystal City, VA, February 17, 2006. The authors wish to thank all the veteran participants for their time and the study interviewers for their commitment to this project.
Conflict of Interest None disclosed.