|Home | About | Journals | Submit | Contact Us | Français|
This project studied the convergent validity of current recall of tobacco-related health behaviors, compared with prospective self-report collected earlier at two sites. Cohorts were from the Oregon Research Institute at Eugene (N = 346, collected 19.5 years earlier) and the University of Pittsburgh, Pennsylvania (N = 294, collected 3.9 years earlier). Current recall was examined through computer-assisted interviews with the Lifetime Tobacco Use Questionnaire from 2005 through 2008. Convergent validity estimates demonstrated variability. Validity estimates of some tobacco use measures were significant for Oregon subjects (age at first cigarette, number of cigarettes/day, quit attempts yes/no and number of attempts, and abstinence symptoms at quitting; all P < 0.03). Validity estimates of Pittsburgh subjects’ self-reports of tobacco use and abstinence symptoms were significant (P < 0.001) for all tobacco use and abstinence symptoms and for responses to initial use of tobacco. These findings support the utility of collecting recalled self-report information for reconstructing salient lifetime health behaviors and underscore the need for careful interpretation.
Public health strategies to estimate tobacco use, its precursors, its correlates, and its sequelae are of considerable concern not only because of adverse effects of tobacco use, but also because tobacco use is intertwined with other health-related behaviors (1). Accurate tobacco use history is necessary for developing biomarkers of harm and exposure critical in clinical and regulatory research. Tobacco research often uses systematic measures (2) to facilitate understanding of the natural history of usage trends, to identify research and treatment needs, and to evaluate prevention and control programs.
Another example of heavy reliance on recall of smoking lies in recent genome-wide association studies of measures of nicotine dependence (3–6). While engaging in high-level quality control of genomic data, all of the studies relied on retrospective self-report of nicotine dependence or smoking quantity without reference to the reliability, validity, or recall interval of these phenotypes. Questions remain regarding whether the reported association between variation in the nicotinic acetylcholine receptor gene cluster on chromosome 15q25.1 and measures of nicotine dependence are independent of the locus’ association with diseases such as lung cancer. Variation in unreported measurement properties of methods of assessing these phenotypes may partly explain the difficulty in teasing apart the underlying causal model.
Tobacco use is one of many health-related behaviors for which epidemiologic research often relies on self-report and recall, with or without biologic verification. Surveillance surveys often include questions about ever use of common forms of tobacco (cigarettes, smokeless, cigars, pipe), age at first tobacco use or first cigarette (initiation), extent of current use, number of quit attempts, and related health behaviors (7).
Most research in tobacco use history relies on recall of experiences. Although the quality of self-reports may depend on strategies to reduce reporting error and optimize recall, retrospective research can overcome some of the limitations of prospective research, in part because perceptions of privacy and confidentiality can affect prospective reporting (8). Underreporting of tobacco use can be related to interaction with an interviewer (e.g., social desirability bias). Even the standard of biochemical testing, which remains the most common method of validating tobacco self-report, is useful only for validating or quantifying exposure for a specific duration. Thus, a repeated-measures prospective study with biologic testing might not capture episodic use of tobacco (8).
The purpose of this project was to examine the validity of questions in the authors’ Lifetime Tobacco Use Questionnaire (LTUQ). To accomplish this, the investigators examined convergent validity of separate measures of the same events, studied first in adolescence and later in young adulthood or adulthood. Specifically, the present analyses examined the convergent validity of a) tobacco use measures administered initially in two separate prospective, repeated-measures cohort studies, compared with b) similar measures administered to the same cohorts with the LTUQ between 2005 and 2008. Because of limitations inherent in self-report, neither the prospective data nor the recent interview data were considered a “gold standard.” Only a small portion of the prospective data reported real-time measures and in most cases involved near-term retrospective recall of events occurring within a period ranging from a week to several years.
This study builds on earlier psychometric work (9, 10) that identified moderate to high test-retest reliability of self-administration of earlier and later versions of the study questionnaire in separate Web samples. The present analyses addressed the following research questions: a) What is the convergent validity of tobacco use responses when original prospective responses are compared with later LTUQ responses? and b) What question-related factors appear to moderate the validity of recall? A related goal was to explore characteristics of the instrument.
Subjects were recruited from cohorts previously studied in two prospective projects: 1) the Smoking in Families Study (346 of 483 original subjects, 71.6%) at the Oregon Research Institute in Eugene; and 2) a prenatal tobacco exposure group in Pittsburgh, Pennsylvania (301 of 426 original subjects, 70.7%). The use of these two cohorts provided long-term (Oregon) and near-term (Pittsburgh) estimates of convergent validity. Started in 1981, the Oregon study was a repeated-measures (10 time points at approximately one-year intervals) cohort study of substance use risk factors. The Pittsburgh study was part of the Maternal Health Practices and Child Development Study. The cohort was selected at the fourth month of gestation and was followed into young adulthood (11 time points) (11–13).
The institutional review boards of SRI International, Oregon Research Institute, and the University of Pittsburgh approved the study. Written informed consent was obtained from all participants, in addition to the informed consent obtained for their original participation in the Oregon and Pittsburgh research studies. Previous cohort members were contacted by written invitations. Oregon participants were contacted with follow-up telephone contact as needed.
Because the two cohorts differed substantially in demographics, as indicated in Table 1, the data sets were analyzed separately.
The original Oregon data were collected through in-person interviews, with questionnaires completed by child/adolescent participants and their families. The average interval between the first Oregon testing and the LTUQ was 19.5 years (standard deviation, 0.6 years). The original Pittsburgh data were collected through in-person interviews. Average time from original Pittsburgh adolescent testing to LTUQ administration was 3.9 years (standard deviation, 0.8).
Programming of the Web-based LTUQ was started in 2004, with major programming revisions and minor text revisions following alpha and beta testing. Programming and hosting were completed by research software company CfMC, San Francisco, California.
Interviewers were trained during on-site visits by the investigators in September and October 2004. One Oregon interviewer was trained at SRI. Data collection began in 2005 and continued through August 2008. Interviewers were monitored, interim data were analyzed, and results were discussed at annual on-site visits. Minor interviewing modifications were made at the 2006 site visits to increase equivalence of participant prompting. Although data were not analyzed across both studies, the investigators and staff endeavored to achieve equivalent interview practices.
The LTUQ was administered primarily by telephone by trained interviewers. Forty-nine Pittsburgh subjects were interviewed in person because of inadequate telephone access or privacy concerns. Seven Oregon subjects self-administered the LTUQ online because of miscommunication. Since the LTUQ was written for both interviewer and self-administration, and since the data from those 7 were not outliers, the data were retained.
Interviewers entered a unique pass code for each subject at the outset of administering the questionnaire, which was hosted on a secure Web site. Researchers received all data without personal identifiers. Responses were encoded and collected on secure central servers and decoded offline before the data were provided to the investigators.
Data used in the analysis of the Oregon prospective data were a subset of the 10 Oregon time points. Prospective Pittsburgh data involved the most participating subjects at about age 16 years, at time point 10. The time points and ages used in the present analyses are noted in Tables 1 through 4.
All prospective data were collected by in-person interviews. Prospective questionnaires included items listed in Tables 2 and and3.3. Specific wording for prospective and LTUQ questions is available online in Web Table 1, posted on the Journal’s Web site, http://aje.oupjournals.org.
The LTUQ retrospectively assessed the use of tobacco or nicotine across the lifespan. Developed initially in 1997–1998, the LTUQ was tested in 3 earlier versions on approximately 1,700 respondents through computer-assisted self-interviewing, computer-assisted telephone interviewing, computer-assisted personal interviewing, and usability testing. Previously published reports on two-year test-retest reliability (9) and two-month test-retest reliability (10) indicated that the LTUQ had high reliability for salient tobacco-related questions.
LTUQ programming utilized computerized features including skip logic, branching, and loops to shorten testing time and minimize attrition. Response options were randomized and rotated to reduce sequence effects and carryover/practice effects, with some response options anchored for consistency. The questionnaire included internal validity checks, accuracy checks, and response limitations that either prevented respondents from entering certain types of inaccurate data or flagged those responses for later examination. The LTUQ was available only in computerized administration mode.
The LTUQ was structured around a core questionnaire that assessed the extent and nature of tobacco use from earliest exposure to the point of testing. Questions covered 4 major types of tobacco—cigarettes, cigars, smokeless, and pipe—and included an open-ended response option for other tobacco delivery methods. In addition to the core questions, module questions examined risk and protective factors related to tobacco use. The core tobacco-use questions assessed initial use, transition to weekly and daily use, quit attempts, and abstinence. Modules of additional questions addressed subjective reactions to initial use (9, 10).
Because of heterogeneity between the two prospective cohorts, data were not analyzed across studies. Within each study, the original data (from age ranges specified in Tables 2 through 4) were compared with LTUQ responses on similar questions with Spearman's rank correlation coefficient (rs), a nonparametric alternative to the Pearson's r correlation; with intraclass correlation coefficient (ICC), used for group comparisons of consistency; or with polychoric correlation, used in ordinal scales and in self-report scales with limited option ranges. Confidence intervals and probability values also were calculated.
Equality of the Spearman's correlation was tested by applying a variance-stabilizing arctangent transformation and calculating a two-sample t statistic for the equality of means of the transformed variables. The equality of ICCs was tested by calculating a two-sample t statistic using the estimated ICCs and their estimated standard error.
The two prospective data sets differed in question content, although both overlapped with the LTUQ. Pittsburgh data included some question sets that were not in the Oregon data, such as the response to initial use of tobacco. Similarly, specific analyses were conducted on the Oregon data, but not the Pittsburgh data, to study Oregon questions that did not have direct corollaries between the prospective testing and the LTUQ administration. Some measures from initial testing were imputed from repeated questions that indicated the year that specific tobacco events occurred (e.g., initiation of daily smoking). Some Oregon questions were not closely related enough to the LTUQ for validity analysis per se. Consequently, for some questions (Table 4), an LTUQ question was compared with available responses from the Oregon data. For example, the LTUQ question Have you ever used cigarettes? had no direct corollary in the sequence of Oregon questionnaires, but a positive or negative response could be implied from other Oregon responses indicating cigarette use. (See Web Table 1.) The consistency of responses was evaluated (Table 4) to calculate the percentage of responses consistent with LTUQ responses and to compare that percentage with the expected percentage that would be consistent if responding were random.
The results supported the variability of validity estimates of retrospective associations and considered the utility of using retrospective assessment to study specific earlier events and behaviors. Results were interpreted in light of general guidelines (14, p. 133) establishing kappa and correlation values as follows: <0.40, poor; 0.40 to 0.59, fair; 0.60 to 0.74, good; ≥0.75, excellent. (All P values are 2-sided.)
No systematic differences related to education, race, and sex were found; the only comparison differing at the P < 0.01 level was a race difference in difficulty inhaling at first use of tobacco, with nonwhites showing less consistent responding across testing times. These results are included as online-only Web Tables 2–4. In the Pittsburgh education comparison, age reporting was from the LTUQ, since the young adults were pre–high school graduation age at their original assessment.
The convergent validity of initial and LTUQ measures of milestones in tobacco use history differed in minor ways by sex (Web Table 2), race (Web Table 3), or educational level (Web Table 4). Male participants in the Pittsburgh study (t = 2.11, P = 0.04) and white participants in both Pittsburgh and Oregon studies (t = 2.17, P = 0.03) demonstrated higher validity reporting the age at first weekly smoking. White participants (t = 2.30, P = 0.02) and those with post–high school education (t = 2.00, P = 0.05), all from Pittsburgh, reported the “dizzy” response with higher validity. White participants from Pittsburgh also reported “difficulty inhaling” with higher validity (t = 2.60, P = 0.01). Pittsburgh participants with a high school education or less demonstrated higher validity reporting age at first cigarette use (t = 2.07, P = 0.04). No comparisons by sex, race, or educational level indicated a P value less than 0.01.
Tobacco use showing consistency between the original data collection and the LTUQ administration included age at first cigarette (ICC = 0.33; 95% confidence interval (CI): 0.19, 0.47) and number of cigarettes/day during daily smoking (Spearman's rs = 0.31; 95% CI: 0.11, 0.49); history of quit attempts (rs = 0.38; 95% CI: 0.14, 0.62); and abstinence symptoms (rs = 0.46, 95% CI: 0.15, 0.77; Table 2).
Responses for less directly related measures were consistent regarding ever use of cigarettes and amount smoked (both, 94.1% consistency, P < 0.01). Responses to basic tobacco use questions, including ever use of cigarettes and smokeless tobacco, indicated that the percentage of consistent responses differed significantly from the percentage that would be consistent if responding were random. (See Table 4.)
Tobacco-use variables measured prospectively and with the LTUQ were age at first cigarette (ICC = 0.58; 95% CI: 0.48, 0.68), age at first weekly smoking (ICC = 0.49; 95% CI: 0.35, 0.63), cigarettes per week during weekly smoking (rs = 0.40; 95% CI: 0.25, 0.54), age at first daily smoking (ICC = 0.52; 95% CI: 0.38, 0.66), cigarettes per day (rs = 0.32; 95% CI: 0.14, 0.48), and time to first cigarette of the day during daily smoking (rs = 0.37; 95% CI: 0.20, 0.50). Similarly, convergent validity for most subjective responses to first use of tobacco was significantly associated, ranging from nausea (rs = 0.51; 95% CI: 0.34, 0.67) to rush/buzz (rs = 0.31; 95% CI: 0.13, 0.49). (See Table 3.)
This project examined the convergent validity of tobacco use measures administered initially in two separate repeated-measures cohort studies, then compared with similar measures administered to the same cohorts years later. Although the need for establishing the validity of basic health-related measures is obvious, the present analyses are among the few tobacco-related validity studies. The present studies examined convergent validity of prospective and retrospective self-report interview responses, indicating that recalled responses demonstrated reasonable convergent validity with prospective responses, particularly in the near-term estimates of convergent validity.
Strengths of the research included the importance of providing guidelines for interpreting research using these commonly used measures; the use of a prenatal tobacco exposure at Pittsburgh and an adolescent-emerging adult cohort at Oregon; and the lengthy time intervals between the initial research and the administration of the LTUQ. Limitations, which may have influenced the strength of the associations, related to differences in question wording from the initial testing to the subsequent LTUQ testing (Web Table 1; http://aje.oupjournals.org); differences in the mode of completion (initial in-person interview and telephone interview); lack of a definitive gold standard for establishing validity of the LTUQ; and possible bias related to nonresponse of initial participants, which could result in an overestimate of the level of agreement.
On some key elements of tobacco use, responses from participants in both the Oregon and Pittsburgh studies demonstrated adequate convergent validity across time and across instruments. Tobacco use responses from the Pittsburgh study, with a considerably shorter duration between original testing and LTUQ administration, all had better validity estimates. Notably, the Oregon participants’ prospective and retrospective reports of quit attempts were consistent, despite the metric defining quitting differing between the original data and the LTUQ.
It would be inappropriate to assume that proximity in time to an event invariably increases the accuracy of measurement. As one review (8) concluded, numerous factors can affect the validity of adolescents’ self-reports of health-risk behaviors. This may be because smoking “tends to be habitual, repetitious, and almost unconscious” (15 p. 8) and because the episodic nature of adolescent smoking defies description of smoking patterns (16). Also, direct interaction with an interviewer may lead to underreporting, even under optimal reporting and interview conditions (8). Additionally, studies using measures collected in real time, also referred to as ecologic momentary assessment techniques, may be limited in sample size and duration because of event-based and time-based designs (17).
It is difficult to estimate whether prospective data were affected by changes in levels of social and legal stigma across time. A possible underreporting of tobacco use in childhood and adolescence could have contributed to bias and error. Additionally, reports of present use of tobacco in the later testing could be affected by social desirability because the LTUQ was administered by interviewers (18).
Prior studies (9, 10) indicated that salience of questioned events appeared to affect reliability of test-retest recall. Although respondents in those two studies of separate Web-based self-administration cohorts were able to answer many questions consistently, events that were less well defined or salient resulted in lower reliability. For example, respondents reliably remembered details about their first use of tobacco but could not calculate reliably how many cigarettes they smoked between experimentation and monthly use.
Previous work has shown valid reporting of tobacco dependence across a period of 5 to 12 years (19). Self-report of tobacco use has shown consistency in review and meta-analysis (20), in smoking and smokeless tobacco use (21, 22), in lung cancer screening with self-report and urinary cotinine (23), and in saliva cotinine verification of audio computer-assisted self-report (24). Despite these positive indications, the limitations of retrospective collection of information are well known, and retrospective techniques often are affected by overall informant inaccuracy. In a frequently cited 1984 review, Bernard et al. summarized: “Informants are inaccurate; memory does decay exponentially with time… . And on top of this, there appears to be systematic distortion in how informants recall just about everything” (25, p. 509).
Scott and Alwin (26) approached the limitations of informant accuracy by examining types of retrospective information: a) recollections of past experiences that involve reconsidering the past and reporting present reactions, and b) reviewing or contemplating the past rather than simply recalling events. “In this sense,” Scott and Alwin noted, “retrospections are not longitudinal at all; they are ‘current’ or rooted in the present” (26, p. 104). Two limitations of retrospective data are 1) measures of past experiences may be unavailable because of lapse in memory, or because the information can no longer be retrieved or accessed; and 2) recollections may be biased by more current experiences and events. Previous analyses of LTUQ retrospective data (9, 10, 27) indicated that while reliability was high for many questions about tobacco use, the overall salience of the recalled events was critical. This concurred with the summation of Scott and Alwin that “even retrospective attitudinal data can be quite reliable if the attitudes concerned are highly salient” (26, p. 121). Therefore the success of studying tobacco-related life-history events appears to depend on whether events are salient when they occur. Additionally, the salience of historical personal events can override current interpretation bias.
Implications for the study of other health-related behaviors are evident because behaviors and conditions that influence health are related. An example is the intertwining of tobacco use with other conditions in pregnancy, childbirth, and early childhood. Shenassa et al. (28) examined the validity of adult 40-year recall of maternal smoking during pregnancy. They reported that higher socioeconomic status and recall of specific situations resulted in more accurate recall. A review of more than 60 studies of birth certificate information (29) found the certificate information to be neither reliable nor valid regarding tobacco and alcohol use, prenatal care, maternal risk, delivery complications, labor, and delivery. A review (30) of articles on maternal recall of breastfeeding practices showed validity for recall of initiation and duration of breastfeeding, but less satisfactory recall of practices with less distinct boundaries or impact. Similarly, a review (31) of adult retrospective reports of adverse childhood experiences indicated that reporting bias does not invalidate case-control retrospective studies of readily defined major adversity. Even so, details might not be reported accurately, particularly if judgment and interpretation are required. Stanton et al. (32), studying recanting of earlier reports of smoking status, found that minority status and reports of earlier smoking frequency could result in misclassification. These mixed findings underscore the need for psychometrically valid measurement of recall of health behaviors in these critical periods.
Responses are rarely better than the questions asked. As detailed in two published LTUQ reliability studies and the present analyses, the accuracy and psychometric quality of the responses depended on the clarity of the questions and salience of the information. At issue is whether an event was sufficiently notable that it could be remembered or calculated decades later.
Conversely, the validity (and reliability, in prior studies) of responses about the age at first cigarette use likely reflected the milestone nature of the event. Pittsburgh subjects’ consistent recall years later of their subjective response to tobacco initiation reflected the salience of an event that occurred several years earlier.
The emergence of withdrawal symptoms in abstinence appeared to be sufficiently salient for high validity in the Oregon study. It may be reasonable to expect accuracy when dependent users recall the experience of trying to quit. It may be less reasonable to expect that smokers will be able to recall accurately how many cigarettes they used per week at various time points across their lives. Milestones that are important to researchers are not necessarily notable to tobacco users at the time the events occur.
An important future direction for this line of health-related psychometric work would be the systematic study of the effect of duration between an original event and its later recall, to elucidate question features that can be anticipated to yield valid, reliable responses across the lifespan.
Author affiliations: Center for Health Sciences, SRI International, Menlo Park, California (Janet Brigham, Harold S. Javitz, Ruth E. Krasnow, Mary McElroy, Gary E. Swan); Washington University School of Medicine, St. Louis, Missouri (Christina N. Lessov-Schlaggar); Oregon Research Institute, Eugene, Oregon (Elizabeth Tildesley, Judy Andrews, Hyman Hops); and University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania (Marie D. Cornelius, Nancy L. Day).
This work was supported by the National Institutes of Health (NIH) (DA018019 to G. E. S., CA75581 (subcontract to G. E. S.), DA11795 to J. B., DA003706 to H. H., DA009275 to M. D. C., HD036890 to N. L. D).
Psychometric work on the LTUQ was funded by NIH grant DA018019 to G. E. S. Initial development work on the LTUQ was under subcontract to University of Michigan, NIH grant CA75581 to Ovide F. Pomerleau. Development of the computerized precursor of the LTUQ was funded by NIH grant DA11795 to J. B. Prospective SMOFAM data were collected under NIH grant DA003706 to H. H. Prospective Pittsburgh data were collected under NIH grant DA009275 to M. D. C. and HD036890 to N. L. D.
The authors express thanks to programmer Nancy Chong; Dale McBride, Sandra Sterry, Oregon Research Institute; and Young Jhon and Margaret Watson, University of Pittsburgh.
The findings were presented in part at the 15th annual meeting of the Society for Research on Nicotine and Tobacco, Baltimore, Maryland, February 28, 2010.
Conflict of interest: none declared.