|Home | About | Journals | Submit | Contact Us | Français|
This study examined the reliability of self-reported smoking history measures. The key measures of interest were time since completely quitting smoking among former smokers; age at which fairly regular smoking was initiated among former and current smokers; the number of cigarettes smoked per day and the number of years of daily smoking among former smokers; and never smoking. Another goal was to examine sociodemographic factors and interview method as potential predictors of the odds of strict agreement in responses.
Data from the 2002–2003 Tobacco Use Supplement to the Current Population Survey were examined. Descriptive analysis was performed to detect discrepant data patterns, and intraclass and Pearson correlations and kappa coefficients were used to assess reporting consistency over the 12-month interval. Multiple logistic regression models with replicate weights were built and fitted to identify factors influencing the logit of agreement for each measure of interest.
All measures revealed at least moderate levels of overall agreement. However, upon closer examination, a few measures also showed some considerable differences in absolute value. The highest percentage of these differences was observed for former smokers’ reports of the number of years smoking every day.
Overall, the data suggest that self-reported smoking history characteristics are reliable. The logit of agreement over a 12-month period is shown to depend on a few sociodemographic characteristics as well as their interactions with each other and with interview method.
When investigators assess current smoking prevalence and smoking history in the U.S. population, they commonly take for granted the high quality of national survey data. Indeed, data from the National Health Interview Survey (Centers for Disease Control and Prevention, 2009) and the Tobacco Use Supplement to the Current Population Survey (TUS-CPS; U.S. Department of Commerce, Census Bureau, 2007a) are used in numerous studies to determine tobacco use patterns in the U.S. adult population (Backinger et al., 2008; Osypuk & Acevedo-Garcla, 2010; Soulakova, Davis, Hartman, & Gibson, 2009; Tindle, Shiffman, Hartman, & Bost, 2009; U.S. Department of Commerce, Census Bureau, 2007b, 2007c). Nonetheless, the difficulties inherent in correctly answering some items and the likely sensitive nature of smoking-related questions could lead to data ambiguity. Therefore, inaccurate results can be observed even if the researchers use cutting-edge statistical methodology to analyze survey data.
To improve the design, administration, and data quality of national surveys, one should take into account respondents’ cognitive and motivational processes when they answer smoking-related questions. Cognitive processing generally includes four stages: interpretation of the meaning of a question, memory search of all related information, integration of all related information, and report of the summary of this information (Tourangeau, Rips, & Rosinski, 2000, pp. 1–22). Difficulties can occur at any of these stages, for example, respondents may interpret a term or a question incorrectly, may have difficulty retrieving relevant information from memory, or may produce a response that fails to match the nature of the response category expected. Furthermore, motivational factors may influence responses. If respondents carry out these processes cautiously and comprehensively, they optimize; otherwise, they may make limited effort and satisfice (Krosnick, 1991, 1999; Krosnick et al., 2002).
Inaccurate responses also can be a direct result of a social desirability bias, in which respondents provide answers that comply with social norms. As a result, respondents might underreport undesirable behaviors, for example, smoking, drinking, and illicit drug use (Johnson & Mott, 2001; Kreuter, Presser, & Tourangeau, 2009; Sillett, Wilson, Malcolm, & Ball, 1978; Tourangeau & Yan, 2007; Velicer, Prochaska, Rossi, & Snow, 1992).
Several research papers have addressed the impact of a survey method on respondents’ satisficing and social desirability bias (Kreuter et al., 2009; Holbrook, Green, & Krosnick, 2003) and overall data quality (Jäckle, Roberts, & Lynn, 2006). It was shown that a respondent is less likely to satisfice during a personal interview than during a phone interview (Holbrook et al., 2003).
Comparing self-reported current smoking habits with results of biochemical assessments have been used to assess validity of self-reports over several decades. Based on the evidence collected in two clinical trials conducted in the 1970s, it was concluded that a relatively large proportion of people who had been advised to quit smoking have provided deceptive answers regarding their quitting smoking (Sillett et al., 1978). A further study utilized the data from the Hispanic Health and Nutrition Examination Survey and investigated underreporting of cigarette consumption among Mexican-American smokers (Pérez-Stable, Marín, Marín, Brody, & Benowitz, 1990). It was found that underreporting was less common among moderate (10–19 cigarettes/day) and heavy (20 or more cigarettes/day) smokers than it was among light (less than 10 cigarettes/day) smokers. However, several other validation studies suggest that self-reports result in a valid estimate of smoking status in the population (Caraballo, Giovino, Pechacek, & Mowery, 2001; Patrick et al., 1994; Pierce, Aldrich, Hanratty, Dwyer, & Hill, 1987; Fortmann, Rogers, Vranizan, Haskell, Solomon, & Farquhar, 1984).
Test–retest reliability measures are commonly used to assess the degree of consistency or stability in response over repeated administrations (Carmines & Zeller, 1979; Groves et al., 2004). For a response to a question to be valid, it must by definition also be reliable. Hence, assessing reliability is a necessary step in measuring an item’s validity. Several studies have addressed the reliability of self-reports by following up respondents to ask about current tobacco use behaviors and comparing them with their prior reports. One such study investigated the reliability of questions concerning current smoking status and smoking initiation age in young adults recruited to mandatory military service in Israel (Huerta, Chodick, Balicer, Davidovitch, & Grotto, 2005). It was concluded that females were more likely than males to provide reliable answers with respect to smoking initiation age. In addition, among people who claimed at recruitment that they currently smoked or had smoked in the past, about 8% claimed being never-smokers at the time of discharge. A study using data from the National Longitudinal Survey of Youth showed that the most consistent test–retest data were reported by adults, when compared with adolescents, and concerned questions targeting recent (past 2 years) events (Johnson & Mott, 2001). In addition, self-reported age of initiation of tobacco use was concluded to be suitably reliable for epidemiological applications.
The aim of our research was to explore the reliability of reports concerning several specific self-reported smoking history attributes. The survey design consisted of twice administering the TUS-CPS to the same respondents. A subset of questions used the same wording, enabling us to assess test–retest reliability for those measures. In particular, we examined reliability with respect to (a) time since completely quitting smoking (as reported by former smokers), (b) the age at which fairly regular smoking was initiated (for current and former smokers), and (c) every day smoking habits (reported by former smokers). In addition, consistency of previously reported smoking habits and current smoking behaviors was assessed. In particular, we selected a subsample of respondents who reported in the second wave as being “never-smokers” and examined their smoking status as reported in the first wave.
We chose these particular measures because they fulfilled two criteria: (a) they satisfied the requirement of test–retest investigation that the true score cannot change over the period of observation, and (b) they are important measures for evaluating the success of tobacco control initiatives and/or important in epidemiologic and other studies of smoking dose and duration. To the extent possible, we also investigated the effects of sociodemographic characteristics, and data collection method, on exact agreement with respect to each measure.
The TUS-CPS is a periodic National Cancer Institute supplement to the Current Population Survey (CPS). The CPS is sponsored by the U.S. Bureau of Labor Statistics and fielded by the U.S. Census Bureau. The TUS-CPS has generally been administered every 3 years since 1992–1993, to more than 240,000 persons in each cycle. TUS-CPS data are commonly used to assess national and state prevalence and patterns of use of tobacco products in the U.S. adult population and to collect information relevant to tobacco control policy (U.S. Department of Commerce, Census Bureau, 2007a, 2007b, 2007c). We analyzed a sample of TUS-CPS respondents who self-responded in February 2002 and in February 2003, with a time difference between surveys of 1 year ± 2 weeks (Davis et al., 2007).
The total sample size was 15,770. Respondents were, on average, aged 48 years (SD = 17), with 8.0% between 15 and 24, 36.7% between 25 and 44, 36.0% between 45 and 64, and 19.2% were 65 years or older. The sample was 58.9% female and 41.1% male: 6.6% Hispanic, 8.2% non-Hispanic Black, 81.9% non-Hispanic White, 2.2% non-Hispanic Asian and Pacific Islander, and 0.9% non-Hispanic American Indian and Alaska Native. In addition, 20.7% of the respondents resided in the Northeast, 27.2% in the Midwest, 28.8% in the South, and 23.2% in the West; 71.6% resided in a “metropolitan” area, 28.0% in a “nonmetropolitan” area, and for 0.4% of respondents, metropolitan status was not identified.
TUS-CPS surveys were administered using a combination of in-person and telephone interviews: 56.0% of respondents responded via telephone both times, 23.7% had in-person interviews both times, 5.8% had a phone interview in 2002 followed by an in-person interview in 2003, and 14.5% had an in-person interview in 2002 and a phone interview in 2003.
The sample was analyzed with respect to several cigarette smoking history attributes, described below.
Measure 1: Time (in years) since completely quitting smoking, which was assessed for former smokers who did not relapse between the two surveys. This measure is based on the survey question “About how long has it been since you completely quit smoking cigarettes?”
Measure 2: Age at which fairly regular smoking was initiated, which was assessed for current and former smokers. The corresponding survey question is “How old were you when you first started smoking cigarettes fairly regularly?”
Measures 3.1–3.2: These measures concern attributes of everyday smoking reported by former smokers and were ascertained for respondents who indicated that they did not smoke between the surveys and who reported being former smokers at both 2002 and 2003 assessments. Measure 3.1 concerned the number of cigarettes smoked per day when last smoked every day, based on the question “When you last smoked every day, on average how many cigarettes did you smoke each day?” Measure 3.2 involved the total number of years smoked every day as determined by the item: “Altogether, about how many years did you smoke every day? Do not include any time you stayed off cigarettes for 6 months or longer.”
Measure 4: Prior smoking status for “never”-smokers. This measure is based on the survey question “Have you EVER smoked at least 100 cigarettes in your entire life?” We determined the frequency of respondents who were consistent in that they in 2003 reported being never-smokers (a response of “No”) and had also reported having been a never-smoker in 2002. In addition to the 100-cigarette item, the subsequent question, “Do you now smoke cigarettes every day, some days, or not at all?” determined current smoking (every day and some day choices) and former smoking (not at all choice).
For continuous measures (Measures 1–3), we first assessed summary statistics concerning the difference between 2002 and 2003 responses. This difference was defined as the 2003 value minus the 2002 value (and corrected by 365 days for Measure 1). Due to a possible rounding error, produced when data recorded in days, months, and years were converted into years for Measure 1 (the time since completely quitting smoking), an exact match was defined as an absolute value of the difference in responses of no more than 6 months. For Measures 2 (age when first smoked fairly regularly) and 3 (the number of cigarettes smoked per day and the total number of years smoked every day), a match was assigned when the answers to 2002 and 2003 surveys were identical. For Measure 4 (prior smoking status among those reporting never smoked at a later time), a match was assigned for those reporting never smoking in both 2002 and 2003 and not a match for those reporting never smoking in 2003 who had reported any other smoking status in 2002.
The odds ratios (ORs) of strict agreement between 2002 and 2003 responses were examined through multiple logistic regressions, which is commonly used in addition to descriptive reliability measures (Belli, Traugott, Young, & McGonagle, 1999; Cowling, Johnson, & Holbrook, 2003; Gillum, 2005; Huerta et al., 2005; Johnson & Mott, 2001). Factors of interest included age, sex, race/ethnicity, region, and metropolitan status, recorded in 2002. In addition, 2002 and 2003 interview methods were considered.
Significance of all two-way interactions and three-way interactions of survey methods in 2002 and 2003 and other variables were explored, while controlling for all main effects mentioned above. Analysis was done using SAS and SUDAAN (Research Triangle Institute, 2008). All models incorporated adjustment for the complex sample design using replicate weights (Current Population Survey, 2006, chap. 14; Davis et al., 2007). To adjust for multiplicity, we incorporated the Bonferroni method to control the overall error rate at 5%.
Tables 1 and and22 present the summary statistics and reliability measures for 2002 and 2003 responses and their differences, respectively, for Measures 1–3. Overall, these results suggest reasonably high reliability of Measures 1–3. As is illustrated in Supplementary Figures 1–4, the distributions of the differences for these measures appear to be symmetric about zero. In addition, McNemar’s test (p < .001) indicates a significant association between Measure 4, the binary measures of reporting being a never-smoker in 2003 and 2002: Respondents who report being never-smokers in 2003 are likely to report that they never smoked in 2002. The kappa coefficient of 0.80 (p < .001) also suggests a relatively high level of agreement.
Despite the consistent findings that, on average, responses in 2002 and 2003 agree, we detected some differences in absolute values with respect to each measure. For time since completely quitting smoking (Measure 1), 58.9% of respondents reported times (adjusted for the 1-year difference) that are within a year, whereas 17.7% of responses were more than 5 years apart (Supplementary Figure 1). The 5th percentile of the difference distribution is −8 years and the 95th percentile is 9 years.
For age when first started smoking fairly regularly (Measure 2), we found that while 36.6% of respondents provided exactly the same age in 2002 and 2003, 64.9% of responses were no more than a year apart, and 87% were within 2 years (Supplementary Figure 2). Meanwhile only 5.1% of the responses were more than 5 years apart. The 5th percentile of the difference distribution is −4 years and the 95th percentile is 4 years.
Some inconsistencies were observed for the number of cigarettes smoked per day when a former smoker last smoked every day (Measure 3.1). Although 46.8% of respondents reported exactly the same number of cigarettes smoked per day in 2002 and 2003, only 56.0% of respondents reported the numbers with a difference of no more than four cigarettes (Supplementary Figure 3). The 5th percentile of the difference distribution is −15 cigarettes and the 95th percentile is 20 cigarettes.
More pronounced discrepancies were observed with respect to the number of years smoked every day among former smokers (Measure 3.2): only 23.3% of respondents reported exactly the same number of years smoked at both reports, 38.7% of responses were within a year apart, while 25.1% of responses were more than 5 years apart (Supplementary Figure 4). The 5th percentile of the difference distribution is −13 years and the 95th percentile is 11 years.
Finally, some discrepancies were associated with Measure 4. Interestingly, out of 9,502 subjects who reported in 2003 that they never smoked, 1,023 (10.8%) respondents reported in 2002 being current or former smokers: 730 stated that they were former smokers, 193 identified themselves as everyday smokers, and 100 stated they were smoking some days.
All final models in Tables 3 and and44 are significant overall (p < .001). Table 3 presents the results for all levels of main effects not included in interactions. Table 4 presents the estimated conditional ORs, which were obtained using “effects” statements (SUDAAN; Research Triangle Institute, 2008, pp. 276–289), and corresponding 95% CIs adjusted via the Bonferroni method; only results for selected levels of interaction terms are presented and discussed below.
Measure 1: Time since completely quitting smoking. The final model included two-way interactions of age by region (p = .004), 2002 interview method by race/ethnicity (p = .004), 2002 interview method by region (p = .006), and 2003 interview method by region (p < .001). For those main effects that were not involved in any interaction terms, only sex was significant (p = .043): Male respondents were overall less likely to report the same time since completely quitting smoking at both reports than were female respondents.
Among the statistically significant conditional interaction effects we examined, we observed that Midwestern residents who had a phone interview in 2003 were less likely to provide consistent responses than were the ones who had an in-person interview in 2003 (adjusted p = .048), while Southern residents who had a phone interview in 2003 were more likely to provide consistent responses than were the ones who had an in-person interview in 2003 (adjusted p = .043). The interaction effects between age and region involve a large number of comparisons. Although some of the comparisons were significant, we do not report the detailed results here unless we noted a consistent pattern. We found: (a) younger age groups to be more reliable than each subsequent older age group in the South for reporting time since completely quitting smoking; (b) 15- to 24-year-olds to be more reliable than all other individual age groups in the Midwest for reporting this measure.
Measure 2: Age at which fairly regular smoking was initiated. The final model contained only the main effects, where only age was significant (p = .016), but none of the individual comparisons revealed any significant differences.
Measure 3.1: The number of cigarettes smoked per day when former smoker smoked every day. Again, the final model contained only main effects. Only metropolitan status was significant (p = .002). However, the only significant individual comparison was between “not identified” metropolitan status and metropolitan status (adjusted p = .001).
Measure 3.2: The total number of years smoked when former smoker smoked every day. The final model included the interaction of sex by age (p = .013). Among the main effects not included in interaction terms, only metropolitan status was significant (p < .001). Among the conditional interaction effects we examined, we observed that males aged 25–44 were less likely to report the same number of years smoked than were females aged 25–44 (adjusted p = .020).
Measure 4: Prior smoking status for “never”-smokers. The final model included two-way interactions of sex by age (p = .017) and sex by race/ethnicity (p = .013). Among the main effects not included in interactions, only the 2003 interview method was significant (p = .036). Among respondents who claimed in 2003 that they never smoked, those interviewed by phone in 2003 were more likely to report they never smoked in 2002 than those interviewed in-person in 2003.
As shown in Table 4, among the respondents who were from 45 to 64 years old or older (65+) and claimed in 2003 that they never smoked, male respondents were less likely to report this in 2002 than were females in the same age group (both adjusted p < .001). Similarly, White and Hispanic males who reported never smoking in 2003 were less likely to report they never smoked 1 year prior than were females in the same race/ethnicity group (both adjusted p < .001).
In this paper, we present results of analyses targeted at assessing the reliability of survey questions with respect to cigarette smoking history. Descriptive analysis generally revealed moderate to excellent agreement for all measures of interest. Thus, we believe that TUS-CPS data can be used to obtain essential information regarding smoking history attributes for the U.S. population.
Despite the overall agreement patterns, we detected some degree of discrepant responses with respect to each measure. In particular, the highest percentage of discrepant results was observed for former smokers’ reports of the number of years smoking every day. In addition, about 11% of the respondents who reported in 2002 being current or former smokers claimed in 2003 that they had never smoked.
There are several potential factors that could have contributed to the observed patterns of inconsistency. First, as is noted in the introduction, respondents could experience difficulties in processing the questions and, therefore, chose to satisfice. For example, when respondents answer a question concerning the total number of years smoked every day, they are instructed to exclude any time they stayed off cigarettes for 6 months or longer. The degree to which the respondent pays attention to this instruction, or works to exclude any periods of nonsmoking of 6 months or more, may vary between the two administrations, leading to inconsistent responses. Second, for fairly regular initiation of smoking there could be a forward telescoping bias, i.e., respondents report events closer to the time of interview than they occurred (Johnson & Schultz, 2005). Because this effect is likely to increase with time since the past event, we would expect differential error in reporting this event in 2002 relative to 2003. Third, there are several specific characteristics of the survey method or change of method (interviewer and time of day effects, competing distractions effects, etc.) which were not accounted for in our study. Finally, the obtained degree of agreement depends on the constructed response variable, for example, we can expect greater variability in Measure 3.1 responses (the number of cigarettes smoked per day) provided by former heavy smokers because 10 cigarettes of 2–3 packs a day is a smaller proportion than it is of 1 pack a day.
As is noted above, in the evaluated setting, response error could be a direct result of imprecise survey questions. Indeed, as is illustrated in Smoking History Measures section, the survey questions refer to somewhat general events. In particular, key words could be misinterpreted by respondents as signals to provide low-effort and imprecise (rather than taxing and exact) answers, such as “about how long,” “about how many,” “fairly regularly,” and “on average.” Thus, the wording of the questions themselves could be a contributor to inconsistent answers. A similar conclusion has been made in the literature with respect to questions with “do not know” as an answer category: presence of these questions can encourage respondent’ satisficing (Krosnick, 1991). However, cognitive testing of these questions suggested that these general terms be used because recall of smoking history information reflects approximations and not exact answers without accurate records. An additional goal of our study was to examine the odds of exact agreement for each measure as a function of a set of characteristics. This analysis revealed that sex, age, race/ethnicity, region, metropolitan status as well as interview administration mode may jointly influence the ORs. Among the significant individual comparisons by sex, a common result was observed for a given age group as well as race/ethnicity: males were less likely than were females to provide the consistent responses regarding the total number of years smoked every day and reporting never smoking. Overall, interview method (telephone vs. in-person) did not produce consistent significant effects on the response, unlike the effect that has been observed with respect to reported smoking prevalence at one point in time (Soulakova et al., 2009).
Throughout the paper, we assume that a respondent’s answer may be valid only if it is also consistent across repeated survey administrations, such as here, in 2002 and 2003. In other words, reliability is necessary, although not sufficient for demonstrating validity. For example, we may find that, for a particular respondent the age of initiating fairly regular smoking reported in 2002 is exactly the same as the one reported in 2003. Such a correspondence of reports, however, does not necessarily imply that this reported age is, in fact, accurate as both reports may be incorrect.
Our results may have direct implications about choosing among alternative measures in analyses of tobacco use, for example, for estimating the total duration of smoking fairly regularly among former smokers. Given that the reliability of the age of initiation of fairly regular smoking (reported by former and current smokers), and that of the time since completely quitting smoking, were higher than the reliability for the number of years a former smoker smoked every day, our findings suggest that it may be preferable to construct the duration of total smoking by using the former two measures, rather than depending solely on former smokers’ direct reports of the number of years smoked every day—even allowing that there may be gaps of nonsmoking within this computed period for some people.
Future research on the reliability of survey reports of tobacco use can be targeted toward further examination of the odds of strict agreement, as a function of demographic or other respondent characteristics. For example, we observed that males were less likely to provide consistent answers than were females when reporting the time since completely quitting smoking. Therefore, it is essential to identify the underlying reasons for this observed gender effect, for example, it could be that males are more likely to satisfice than are females when answering smoking-related questions.
It is also important to investigate other criteria for defining agreement, such as through relaxing requirements for strict agreement (e.g., allowing a difference of up to 2 years between test and retest responses regarding the number of years smoked everyday by former smokers, when establishing matching rules). One problem with this type of relaxation, however, is that such a definition may not be reasonable across all population groups. For example, a 2-year difference in responses for older populations might not be as meaningful as it is for young adults. Thus, suitable definitions should be constructed with respect to subgroups if one wishes to examine the odds of such an agreement.
In addition, it would be of interest to investigate factors that may contribute to the odds of producing highly contradictory responses at two time points. There could be a set of sociodemographic, smoking behavioral, or survey administration characteristics associated with response discordance. For example, it might be the case that more cigarettes smoked per day, more discrepant responses about the number of cigarettes smoked per day are provided. For never smoked reporting, it could be the length of time since completely quit smoking among former smokers that effects the reliability of reporting former versus never smoking.
This work was supported by the NCI at the National Institutes of Health (sponsor award number HHSN261200900395P) awarded to JNS.
The authors would like to thank the Deputy Editor and both reviewers for their valuable feedback that helped improve the manuscript, senior program analyst James Gibson for providing the data set and science writer Anne Brown Rodgers for her editorial suggestions.