|Home | About | Journals | Submit | Contact Us | Français|
D. H. Odierna developed the concept for the article, carried out the statistical analysis, and wrote versions of the article. L. A. Schmidt contributed substantially to the overall themes, writing, and editing at each stage; she also originated the Welfare Client Longitudinal Study (WCLS) and supervised all aspects of its implementation.
We sought to determine whether failure to locate hard-to-reach respondents in longitudinal studies causes biased and inaccurate study results.
We performed a nonresponse simulation in a survey of 498 low-income women who received cash aid in a California county. Our simulation was based on a previously published analysis that found that women without children who applied for General Assistance experienced more violence than did women with children who applied for Temporary Assistance to Needy Families. We compared hard-to-reach respondents whom we reinterviewed only after extended follow-up effort 12 months after baseline with other respondents. We then removed these hard-to-reach respondents from our analysis.
Other than having a greater prevalence of substance dependence (14% vs 6%), there were no significant differences between hard- and easy-to-reach respondents. However, excluding the hard to reach would have decreased response rates from 89% to 71% and nullified the findings, a result that did not stem primarily from reduced statistical power.
The effects of failure to retain hard-to-reach respondents are not predicable based on respondent characteristics. Retention of these respondents should be a priority in public health research.
Respondents who participate in all phases of longitudinal studies are likely to differ from those who are lost to follow-up.1–5 Differential attrition, or nonrandom loss of respondents, can lead to bias in a study's findings by changing the composition of the sample so that it no longer represents the study population, especially when response rates are low and there are large differences between responders and nonresponders.6 Attrition also reduces sample sizes, contributing to the risk of type 2 error by decreasing statistical power to detect effects.7
Although much is written about respondents who are hard to reach and about ways to increase response rates in hard-to-reach populations,2,8–19 an examination of the literature uncovered few precise descriptions of follow-up protocols in longitudinal studies.20 Descriptions were often imprecise, and there was a good deal of variation in the procedures and amount of effort used to track populations considered difficult to reach.10,21–23 Nonetheless, the literature provides recommendations for improving the retention of respondents, including limiting sample sizes to allow for adequate follow-up, offering incentives to study participants, implementing tracking procedures with multiple contact methods and flexible protocols, permitting file sharing among interviewers, engaging the target population at the outset of the project, and allowing an open-ended number of contact attempts.3,5,6,16,19,24
However, many of these methods are costly and time consuming; researchers in a range of US survey centers consistently report that budgetary limitations, more than anything else, determine how much effort is put into tracking respondents.20 Moskowitz suggested that attrition bias is endemic in public health research and may be related to competition for funding and publication.25 Funding agencies are often reluctant to pay for costly extended follow-up efforts, and investigators may contribute to the problem by minimizing study weaknesses that result from attrition.25
Groves26 wrote that although nonresponse bias is a problem in survey research, low response rates do not necessarily introduce bias. He found that different estimates within surveys may be subject to greater bias-related variation than are estimates in surveys with different response rates. Moreover, he warned that “blind pursuit” of high response rates may introduce measurement error and bias, in part because reluctance to participate might affect the validity of respondents’ answers. An analysis that examined this idea showed that answers from reluctant respondents were generally consistent with data found in records. Although some measurement error was apparent, overall bias was lower when hard-to-reach and reluctant respondents were included in the sample.27 Methods of increasing response rates without increasing error rates are still being developed,28 and it is unclear whether findings on nonresponse bias in probability sampling can be applied with confidence to attrition bias in longitudinal research.
We empirically examined the effects on attrition bias of retaining hard-to-reach respondents in the Welfare Client Longitudinal Study (WCLS),29,30 a longitudinal survey that used particularly intensive tracking procedures. We compared the characteristics of hard-to-reach respondents with those who were more easily found and examined the effects on response rates and on study results of failure to retain the hard to reach. To this end, we developed systematic procedures for empirically identifying hard-to-reach respondents within the WCLS sample.
To determine if the failure to locate hard-to-reach respondents would have affected study conclusions, we performed a nonresponse simulation based on the 2006 study by Lown et al.31 on violent victimization among women in the WCLS. That study found that female General Assistance applicants, mostly single and without children, were more likely to be victims of violence than were female Temporary Assistance to Needy Families (TANF) applicants, who were mostly young single mothers. These findings led Lown et al. to conclude that it is misguided to provide domestic violence prevention programs nearly exclusively for women receiving TANF and thus ignore the even more disenfranchised women in poverty who are without children and therefore not eligible for TANF.
The WCLS provides an unusual opportunity to examine the effects of extended case-tracking efforts on attrition in a marginalized, low-income population. The study population contained many individuals who are generally considered to be difficult to retain in longitudinal research, including the homeless and unstably housed and individuals with extremely low incomes, chronic unemployment, and substance use disorders.3–6,10,32,33
We used extended tracking efforts and successfully located 439 of the 498 women for follow-up interviews; this yielded a weighted response rate of 89% at 1 year postbaseline. The site is a large, relatively affluent California county with wide economic, geographic, and ethnic diversity.34 Cash aid in the county is provided via TANF, the federally funded welfare program for poor families, and General Assistance, the county's locally funded aid program of last resort for qualified adults who do not have custody of minor children. Comprehensive descriptions and findings of the WCLS may be found elsewhere.29,29–36 This study used data from 2 waves of the WCLS. Our analysis focused on variables from the baseline interview and was supplemented by additional information from wave 2, 12 months later, that we used to identify respondents as “easily found,” “lost,” or “hard to reach.”
In the summer of 2001, WCLS researchers approached 1786 male and female applicants for cash aid and successfully interviewed a representative sample of adults (n = 1510), yielding a baseline response rate of 85%. For the main survey cohort, the researchers randomly selected applicants who received aid, intentionally oversampling problem drinkers and frequent drug users. They reinterviewed respondents at the 1-year anniversary of the baseline survey. Following Lown et al., we focused on female WCLS respondents. The women in our sample (N = 498) included 108 General Assistance recipients and 390 TANF recipients. One woman died before the second wave of data collection was complete. She was dropped from the analysis.
WCLS contracted a fieldwork agency to interview respondents, with the expectation that all cases would remain open until the respondents were found. This agency returned cases with no remaining viable leads to WCLS scientific staff for further tracking. A survey tracker with private investigator training was hired to locate the most difficult cases, a strategy that is costly and unusual in public health research.19 Tracking consisted of multiple mailings, telephone calls to respondents and contacts, searches of public databases, home visits, and visits to parks, street corners, shelters, check cashing stores, soup kitchens, jails, drug treatment programs, and other promising sites. Interviewers shared files; at times, more than 1 interviewer simultaneously tracked a respondent. We did not place any predetermined limits on the number of contact attempts. As a result, respondents received up to 12 letters, 57 calls, and 28 visits before either we found them or we ended the search. Cases remained open for an average of 55 days. Continuing respondents received cash incentives of $40 to $50 at each wave of data collection.
To empirically identify individuals who were hard to reach, we coded and analyzed field note files maintained by the study team throughout tracking for the 1-year follow-up. These files contained complete logs of all contact attempts for every respondent in the sample. Multiple raters noted the details of each contact attempt. We entered the data into a spreadsheet program (Microsoft Excel 2000; Microsoft Corp, Redmond, WA), cleaned the data, and concatenated them with the survey data files, matching by respondent identification number. To assess interrater reliability, a subsample of 94 files was test-rated by 4 different raters. The raters agreed on the final classification of hard-to-reach status in 93 of the 94 cases.
Interviews with fieldwork experts in US survey research centers20 and descriptions of extended effort in the literature1,3,16,19,37 informed our definition of “hard to reach.” Because interviewers shared files and there was no strict order for the methods used to track respondents, we did not attempt to evaluate each tracking method separately. Instead, we used a summary measure. Respondents were classified as hard to reach if extended effort was needed to contact them—that is, if any of the following conditions were met: (1) more than 14 telephone calls were made to the respondent or her designated contacts, (2) more than 3 residential visits or 1 or more nonresidential visit was made, (3) more than 5 letters were mailed to the respondent and alternative contacts, (4) the duration of the search lasted over 60 days, or (5) the fieldwork agency turned the case back to the WCLS scientific staff. We classified respondents who were contacted without extended effort as easily found.
The outcome variable was participation status 12 months postbaseline (i.e., easily found, hard to reach, or lost). Of the 498 respondents in this analysis, we classified 339 as easily found and 100 as hard to reach. We classified the 59 respondents the interviewers were unable to reach as lost. Because we were interested in researchers’ ability to find respondents rather than in respondents’ willingness to be interviewed once found, we considered 4 respondents who were contacted but explicitly refused to be interviewed to be successfully located for the purposes of our study. Of these, 3 were easily found, and 1 was hard to reach.
Independent variables included race/ethnicity, age, education, and marital or cohabitation status, social isolation (no close friends or family members), current disability, episodes of homelessness in the past year, long-term (>1 year) unemployment, and family income under $10 000. Other measures included program type (General Assistance or TANF), self-reported health status (4 categories collapsed into 2: poor or fair, and good or excellent), and previously used self-reported measures35,38–40 of past-year heavy drug use, problem drinking, and substance dependence. In addition, the nonresponse simulation comprised 8 summary measures of violent victimization identical to those developed by Lown et al. for their study.31
We began by using the χ2 test of association to compare the baseline characteristics of hard-to-reach and easily found respondents. Next, we examined crude response rates in the full sample and various subgroups, including and excluding the hard-to-reach respondents. We then performed a simulation based on the study on women and violence of Lown et al. In this simulation, we sequentially examined the effect on findings from the baseline responses of a cohort when excluding respondents who were lost at 12 months and also those who were hard to reach at 12 months from the analysis.
In the simulation, we analyzed exposure to violence among General Assistance and TANF recipients through nested comparisons of (1) the full baseline cohort sample of all aid recipients, (2) the baseline cohort sample excluding those lost at 12 months, and (3) the baseline cohort sample excluding those lost and those defined as hard to reach at 12 months. In keeping with Lown et al., we used the χ2 test to compare General Assistance and TANF recipients in terms of baseline variables. The final steps in the simulation replicated the logistic regression analyses of Lown et al. in the 3 nested comparisons.
Lown et al. examined associations among variables in the WCLS applicant sample. Because we made no attempt to track applicants, our simulation was limited to those applicants who became aid recipients, participated in the longitudinal study, and were tracked over time. We analyzed baseline survey responses of aid recipients in the cohort study for 2 reasons. First, we wanted to include data from lost respondents in our simulation; we interviewed these respondents at baseline but not at 12 months. Second, Lown et al. used baseline data, and we wanted our simulation to follow their methods as closely as possible to approximate the effect of an extended follow-up effort on actual study results.
We applied poststratification weights to correct for sampling design (i.e., oversampling of problem drinkers and drug users in TANF and time spent recruiting participants in different programs, locations, and offices) and baseline nonresponse. Because our analysis focuses on response status at wave 2, we did not weight for nonresponse at 12 months. We considered results significant if P was less than .05. We conducted statistical analyses with the survey commands in Stata versions 8 and 9 (StataCorp LP, College Station, TX). These commands implement a design-based F statistic to assess associations between categorical variables.
We expected to find that retaining hard-to-reach respondents would substantially improve response rates. Table 1 shows that excluding the hard-to-reach group would have lowered the overall 1-year follow-up response rate from 89% to 71% and decreased response rates in some subgroups to well under 70%. Without the extended tracking, it appears that the WCLS would have failed to retain substantial numbers of respondents who were African American, in poor or fair health, homeless, or heavy substance users or who had very low incomes or were victims of violence.
We hypothesized that bivariate comparisons would reveal many significant differences between respondents who were easily found and those who were found but hard to reach at 12 months. However, as shown in Table 2, the only characteristic that was significantly associated with being hard to reach was substance dependence. There were no significant differences in race/ethnicity, age, education, marital status, parental status, income, employment, disability, housing stability, program type, health status, problem drinking, frequent drug use, or violent victimization.
To simulate the effects of excluding hard-to-reach respondents on the findings of Lown et al. (i.e., women who applied for General Assistance had more significant risk factors for violent victimization—problem drinking, divorce or separation, poverty, or homelessness—than did women who applied for TANF),31 we compared sociodemographic, health, and behavioral characteristics of the General Assistance and TANF subsamples with and without the hard-to-reach respondents. Because, as in the study by Lown et al., significant differences between the General Assistance and TANF groups remained even when the hard-to-reach women were excluded from the analysis (data not shown), this initial bivariate comparison suggested that excluding the hard to reach would not have a measurable impact on overall results. However, as Table 3 illustrates, biasing effects emerged when we compared violent victimization among General Assistance and TANF recipients.
Odds ratios and prevalence estimates in all 8 categories of violence in the General Assistance and TANF groups were similar for the full baseline cohort and for all recipients who were found at the 12-month follow-up. However, excluding the hard-to-reach women from the analysis produced substantially different point estimates; odds ratios were biased downward in 7 of 8 categories. In 2 of the categories, confidence intervals widened, but differences remained significant; in 6 of the categories, between-group differences decreased below the point of statistical significance; and in 5 of the categories, confidence intervals narrowed. This suggests that, in the absence of the extended tracking efforts, we would have seen associations between violent victimization and program type in only 2, rather than all 8, categories of violence and in no category of partner violence.
In a 1996 analysis of longitudinal data on victimized low-income women, Sullivan et al. concluded that their study's commitment to sample retention and extended tracking efforts was a key factor in achieving high response rates and that it was worth the extra time and money spent.1 Our study here provides empirical support for this conjecture by demonstrating that extended tracking effort can have a significant impact on increasing response rates and reducing bias in findings on violent victimization among low-income women.
Our investigations revealed that retaining hard-to-reach respondents increased response rates in the overall sample from 71% to 89% and substantially raised response rates for important subgroups. More importantly, excluding hard-to-reach respondents would have biased our estimates, leading to erroneous conclusions. Although attrition-related reductions in sample sizes can increase the likelihood of type 2 errors, we found that this did not have a substantial effect on our findings. Overall, excluding the lost and hard-to-reach individuals decreased the prevalence of violence in the General Assistance group and increased it in the TANF group. As a result, in most of the categories of violence we looked at, the difference expressed in odds ratios was substantially smaller.
We would have missed the evidence that violence prevention services for the women without children receiving General Assistance are needed even more than are those currently available to women with children receiving TANF. This provocative finding suggests that failure to retain hard-to-reach respondents can bias study results even when simple bivariate comparisons yield few significant differences between easy- and hard-to-reach respondents. The lack of consistency in results across 2 different phases of our analysis suggests that comparing respondents to nonrespondents on basic sociodemographic and health-related criteria may be insufficient for dismissing concerns about potential bias resulting from the attrition of hard-to-reach respondents.
Although we found few differences between women who were hard to reach and those who were easily found, substance dependence at baseline was significantly associated with being hard to reach at 12 months. Among poor and marginalized women, it is likely that substance dependence adds to an already heavy burden of disadvantage and stigma41–43 and may contribute to both higher rates of violence and difficulties in locating substance-dependent women for reinterview. Thus, substance dependencemay provide anexample of atypeof variable that contributes to attrition bias by increasing the propensity for nonresponse while affecting outcomes of interest.26
Our access to fieldwork files allowed us to systematically document the extent of tracking efforts for all study participants. Respondents are usually defined a priori as “hard to reach” by virtue of their sociodemographic and lifestyle characteristics. In this study, we used interviewers’ field notes to empirically examine a posteriori the characteristics of demonstrably hard-to-reach respondents. However, the files were working documents that WCLS scientific staff and several interviewers used, sometimes simultaneously, to track respondents. The resulting complexity puts meaningful cost estimations beyond the scope of this study. Future studies of tracking procedures and level of effort could be designed to provide data that support cost effectiveness analyses.
Other limitations apply. The relatively narrow socioeconomic distribution of the WCLS sample provided us with an opportunity for detailed examination of a population that includes a large proportion of hard-to-reach study participants. Yet findings for the exclusively female welfare sample are not necessarily generalizable to other poor populations in other locales or to men living in poverty. We included only baseline survey responses in our simulation analysis, which sorted respondents by their participation status in the second wave of data collection 12 months later. Differences that exist at baseline may increase or decrease over time. In addition, because we used only self-reported responses and did not compare them with medical or administrative records, we are unable to address concerns about the quality and validity of data provided by respondents who were hard to reach.26
Future research should examine the effects of retaining hard-to-reach respondents in general population studies and in changes across multiple waves of longitudinal data collection. Finally, we looked at only the likely effects of failure to retain hard-to-reach respondents on results for violent victimization. Estimates for some variables may be more vulnerable to nonresponse bias than others in the same survey.26,27
Attrition of hard-to-reach respondents can be described as a form of exclusion, a separation of study respondents from research itself, from the attention of service providers and policymakers who use the results of research to inform their decision-making processes, and from social benefits that may arise as research is translated into policy and practice.
Our findings underscore the importance of retaining hard-to-reach respondents for accuracy and reliability in public health research. Some investigators suggest the risk of attrition bias may be reduced by oversampling respondents with characteristics linked to being hard to reach,4 by making postsurvey adjustments based on carefully chosen auxiliary variables,26 or by including survey questions that make it possible to identify respondents who are likely to be hard to reach and then plan for the extended tracking of these respondents.2 However, these techniques may be of questionable value in studies in which few measurable characteristics differentiate hard-to-reach respondents from others in the population.
The findings of this study suggest that even when demographic and health-related differences appear to be small, the failure to retain hard-to-reach respondents can substantially affect study findings. Therefore, at the outset of data collection, it may be difficult, perhaps impossible, to predict if such failure would introduce bias and affect the accuracy of the final results. We have also shown that extended follow-up efforts and protocols tailored for the study population as a whole can be an important factor in increasing response rates and limiting bias in studies of poor populations such as this. Groves warns that the risk of producing biased and inaccurate results increases as the differences between responders and non-responders become larger6; in studies of more economically and socially diverse populations, the effects of retaining hard-to-reach respondents may prove to be even more pronounced than those reported here.
The results of this study suggest that attaining high response rates should be a high priority for public health researchers and their funding agencies. Moreover, the findings provide justification for developing standard procedures for extended follow-up efforts to retain hard-to-reach respondents in low-income and marginalized populations. To the extent that future research can empirically identify hard-to-reach groups in a wider range of study populations, researchers will be able to cultivate more-efficient strategies for tracking these groups and for studying the extent to which extended tracking effort affects response rates and bias in results. Ultimately, such studies will allow researchers to develop realistic standards for careful, extended follow-up in key target populations and in general population studies. Only then will we be in a position to advocate sufficient resources for including and retaining members of hard-to-reach groups in all public health research.
We give special thanks to E. Anne Lown for substantial contributions to this study and for access to the summary variables she created for her study. The authors are grateful to Laurie Jacobs, Denise M. Zabkiewicz, James Wiley, and the WCLS team for substantial assistance with development of the article, to Lisa R. Hirsch for developmental and copy editing, and to Kim L. Serkes and Matthew Zivot for research assistance. D. H. Odierna also wishes to thank the chair of her dissertation committee, William A. Satariano, and extends special thanks to Maureen Lahiff for statistical and conceptual assistance and to S. Leonard Syme for guidance through all phases of the project. Both authors extend sincere thanks to their colleagues in the Philip R. Lee Institute for Health Policy Studies writing seminar at University of California, San Francisco.
Human Participant Protection
The survey design, survey instrument, and consent documents were approved by the institutional review boards at the University of California, San Francisco, and the Public Health Institute. Participants were protected by a federal Certificate of Confidentiality from the US Department of Health and Human Services. This secondary analysis was conducted after appropriate review and exemption by the institutional review board at the University of California, Berkeley.