|Home | About | Journals | Submit | Contact Us | Français|
Retrospective recall of smoking during pregnancy is assumed to be substantially biased, but this has rarely been tested empirically.
We examined the validity of an interview-based retrospective recall more than a decade after pregnancy, in a cohort with repeated, multimethod characterization of pregnancy smoking (N=245). Retrospective smoking patterns were examined in relation to prospective reported and biological estimates of overall and trimester-specific smoking status and intensity. We also compared characteristics of women whose smoking status was misclassified by either prospective or retrospective measures with women whose status was congruent for nonsmoking across timepoints.
In general, sensitivity and specificity of recalled smoking were excellent relative to both prospective self-reported and cotinine-validated smoking status and trimester-specific intensity. However, measures were less congruent for amount smoked for women who recalled being heavy smokers. Further, retrospective measures captured some smokers not identified prospectively due to smoking that occurred prior to assessments. Women who would have been misclassified as nonsmokers based on either prospective or retrospective assessment differed significantly from congruently classified nonsmokers in a number of maternal, family, and neighborhood, but not child behavior, characteristics.
When epidemiological studies of the impact of smoking in pregnancy use retrospective methods, misclassification may not be a significant problem if prenatal smoking is assessed in terms of the pattern across pregnancy. This type of interview-based recall of pregnancy smoking may be relatively accurate, although optimal measurement should combine retrospective and prospective self-report and biological assays, as each provide unique information and sources of error.
Epidemiological studies of the impact of maternal smoking in pregnancy on outcomes in offspring frequently rely on maternal retrospective self-report of smoking, sometimes soon after the pregnancy, sometimes many years later. It is often assumed that such retrospective reports are rife with recall error and bias. Yet, only a few studies have empirically examined this question in relation to prenatal smoking (Githens, Glass, Sloan, & Entman, 1993; Heath et al., 2003; Tomeo et al., 1999; Yawn, Suman, & Jacobsen, 1998). All report high levels of agreement between retrospective and prospective measures. However, limitations of measurement in these studies are problematic because none examined recall of smoking in a way that captures patterns of smoking across pregnancy. Among the few studies that have examined smoking, all used dichotomous measures of smoked versus not. Further, we are not aware of studies that have examined this question in relation to timing or intensity of smoking, the correspondence of recall to biological assays of maternal smoking, or factors that predict poor recall.
Measurement error that contributes to misclassification of smoking status may hamper detection of subtle or long-term effects of maternal smoking during pregnancy. For example, if a woman stopped smoking as soon as she learned she was pregnant, she may retrospectively report being a nonsmoker during pregnancy, truthfully from her own long-term perspective. She would be classified in most studies as a nonsmoker or quitter; yet her fetus would be at risk for any adverse outcomes associated with first-trimester exposure.
Smoking in pregnancy is a complex behavior that fluctuates over the course of pregnancy (Pickett, Rathouz, Kasza, Wakschlag, & Wright, 2005; Pickett, Wakschlag, Dai, & Leventhal, 2003). Although much emphasis has been placed on nondisclosure as a source of error, the extent to which mothers truthfully report whether or not they smoke is only one influence on the accuracy of measurement—others include the timing of measurement, comprehensiveness of assessment, and individual differences in smoking topography and metabolism. Retrospective interviews that query smoking across the course of pregnancy using methods designed to capture the manner in which mothers structure their behavior can substantially improve the reliability of reporting (Brigham et al., 2008). Similarly, biological assays are useful for obtaining a prospective direct measure of smoking but reflect approximately only the previous 24 hr.
In addition to the complex nature of smoking behavior and its impact on the validity of recall, it is likely that errors in maternal recall do not occur randomly, but little is known of the maternal and family characteristics that influence them. Studies have shown that recall of past events other than smoking may be influenced by maternal characteristics such as mood, personality, and mental health problems (Jaffee & Price, 2007). For example, maternal antisocial tendencies may be associated with nondisclosure or underreporting because lying is a characteristic of antisocial behavior, or alternatively with more accurate reporting because of a lack of concern with conforming to social mores. Maternal depression has also been associated with systematic reporting errors. Maternal socioeconomic status may also influence validity of reporting due to varying social norms around smoking in different socioeconomic and social class contexts (Pickett, Wakschlag, Rathouz, Leventhal, & Abrams, 2002). Having a child with a mental health problem may also introduce bias in recall (Jaffee & Price, 2007).
The direction and magnitude of bias caused by the misclassification of smoking during pregnancy will depend on the magnitude of the true effect of exposure, as well as the degree of misclassification. Under most plausible scenarios, however, misclassification is likely to lead to an underestimate of the true risk. Previously reported simulations of the effects of misclassifying smoking during pregnancy in epidemiological studies found that the underestimation of the relative risk for smoking on hypothesized adverse outcomes ranged from <10% to 55% under varying misclassification scenarios (Pickett et al., 2003).
In this study, we address gaps in research on the validity and reliability of maternal recall of smoking within a unique cohort with repeated prospective measurements of smoking that include both self-report and biological assays, along with retrospective recall of pregnancy smoking patterns and rich data on maternal and family characteristics and social context. We link these to address the following research questions:
For these analyses, we used data collected during a follow-up study of a prospective cohort of pregnant women and infants who were enrolled in 1986–1992. The study was originally established as the Maternal–Infant Smoking Study of East Boston to compare the relative effects of in utero and early-life exposure to cigarette smoke on infant lung function (Hanrahan et al., 1992; Tager, Ngo, & Hanrahan, 1995). The follow-up was conducted as the East Boston Family Study (EBFS) as the offspring entered adolescence, with a primary aim of investigating the long-term impact of smoking in pregnancy on offspring behavior (Wakschlag et al., 2009). Follow-up data presented here are drawn from Wave 1 of EBFS. In this paper, pregnancies are the unit of analysis, and eligibility is restricted to the 245 pregnancies where the biological mother was interviewed and smoking information was available in both the original study and the follow-up study. (Detailed descriptions of the sampling and eligibility criteria are available in Supplementary Materials.)
Women were recruited at the first prenatal visit; the timing of this varied from 10 to 27 weeks gestation, with 21% recruited in the first trimester. They were asked at baseline about their past and current smoking status. At each subsequent prenatal visit (modal number of visits=7, range 1–12), women reported current smoking habits, including the number of cigarettes being smoked per day. A urine sample was collected and analyzed for measurement of cotinine by radioimmunoassay at each of these visits. Urine cotinine values were corrected for urine concentration and expressed as nanogram per milliliter of urinary creatinine. We have reported previously on the complexity of concordance between prospective self-report and cotinine measures in this sample (Pickett et al., 2005). In brief, the sensitivity and specificity of prospective self-report compared with cotinine-indicated smoking were 88.4% and 99.0%, respectively.
In the follow-up study, women were asked to recall their smoking behavior during pregnancy. The mean duration of recall was 14.5 years (SD = 1.7 years); study offspring were aged from 11 to 18 years. Women were asked how many cigarettes per day they smoked before they knew they were pregnant, in the first trimester after they knew they were pregnant, and in the second and third trimesters. They were also asked about current smoking habits.
Measures of maternal sociodemographic characteristics at follow-up included age, employment status, receipt of public benefit, and home ownership. Mothers’ health was measured as self-rated general health, self-rated emotional health, history of mental health service use, and history of psychiatric hospitalizations. Maternal problem behaviors included: substance misuse, measured as a score >6 on the Drug Abuse Screening Test (Skinner, 1982); problematic alcohol use, assessed as a score >4 on a modified 51-item version of the Michigan Alcoholism Screening Test (Selzer, 1971); and antisocial behavior, assessed with a score >24 on the Antisocial Behavior Checklist (Zucker, Noll, Ham, Fitzgerald, & Sullivan, 1994). Personality traits related to problem behavior were measured using the Zuckerman–Kuhlman Personality Questionnaire, which gives mean scores for impulsivity/sensation seeking and aggression/hostility (Zuckerman, 2002). Measures of maternal stress and support included the mean score on the 16-item Parental Well-Being Index (Dunst & Trivette, 1986) and the 19-item Medical Outcomes Study Social Support Survey—higher scores on each are positive (Sherbourne & Stewart, 1991). We also measured stress in the past month using mean score on the four-item Perceived Stress Scale (Cohen, Kamarck, & Mermelstein, 1983) and lifetime stress with the mean score on a modified five-item Evaluation of Lifetime Stressors Questionnaire—higher scores on these indicate more stress (Krinsley, 1996).
Marital status was coded dichotomously. The adequacy of material resources within the family was measured by mean score on the 30-item Family Resources Scale—high scores are positive (Dunst & Leet, 1985). Family instability was measured as the mean score of a 9-item scale, based on the work of Forman and Davies (2003), which measured changes in residence, primary caregiver, transitions in romantic relationships of the primary caregiver, job or income loss, and death or serious illness of a family member. High scores indicate more instability.
Perceptions of neighborhood violence were measured with three items on a 5-point scale, developed specifically for the EBFS study; high scores indicate the mother experiencing more neighborhood violence.
We also examine youth conduct problems because of the focus of current research on causal modeling in relation to the smoking: Conduct problems link. Antisocial behavior was measured via the Antisocial Behavior Checklist, a 60-item inventory completed by the adolescent offspring about themselves (Zucker et al., 1994). Conduct disorder symptoms were measured by asking the parent about the adolescent offspring’s behavior, using the Diagnostic Interview Schedule for Children (Shaffer, Fisher, Lucas, Dulcan, & Schwab-Stone, 2000).
We first assessed the accuracy and reliability of women's long-term retrospective self-report of smoking status in pregnancy (i.e., smoked or not) by examining the sensitivity, specificity, and kappa statistics for retrospective report of any smoking in pregnancy in comparison with two prospective measures—ever having reported smoking during pregnancy and ever having a cotinine value indicating active smoking during pregnancy. Next, we reclassified 36 women who retrospectively reported smoking only before they knew they were pregnant and who (if they had truly not smoked once they learned of the pregnancy) could not have been classified as smokers prospectively at prenatal visits and reexamined sensitivity, specificity, and kappa statistics in comparison to prospective self-report of any smoking and positive cotinine measures.
To examine whether or not recall is affected by the timing of smoking in pregnancy, we next assessed the reliability of maternal recall of smoking status during each trimester based on retrospective self-report versus prospective self-report and prospective cotinine measures.
We examined recall of intensity of smoking using the Bland and Altman (1986) technique to assess the degree of agreement in number of cigarettes smoked between prospective and retrospective report among the 87 women for whom there were both retrospective and prospective evidences of smoking. We also examined whether there was a positive linear relationship between the two reports by examining the correlation between number of cigarettes smoked per day among the same 87 women; we looked at the correlations and cross-tabulations between the amount smoked averaged across pregnancy and within each trimester.
Finally, we sought to understand how maternal, family, neighborhood, and child behavioral characteristics were related to the probability that a mother's smoking would be misclassified in an epidemiological study that used only prospective or retrospective measures. Women who recalled smoking in pregnancy but whose smoking was never captured prospectively by either self-report or cotinine were classified as retrospective-only smokers. In retrospective studies, these women would be classified as smokers, but in prospective studies as nonsmokers. We classified women who were captured as smoking by prospective measures but who did not recall smoking retrospectively as prospective-only smokers. In prospective studies, these women would be classified as smokers, but in retrospective studies as nonsmokers. As in either study design, incongruent smokers would be misclassified as nonsmokers, so we examine their characteristics with reference to congruent nonsmokers. Due to small numbers of women with incongruent measures, we interpret p values < .10 as statistically significant.
Overall, women's long-term retrospective recall of smoking in pregnancy was accurate and reliable, in comparison with both prospective self-report and prospective biological assessment of smoking status (Table 1). In analyses examining recall of smoking status, among women who were positive for ever smoking in pregnancy by cotinine, 95.6% recalled having smoked, while among those who prospectively reported smoking, recall sensitivity was slightly higher, at 98.1%. Among women who were never identified as smokers by a positive cotinine prospectively, 19.7% recalled having smoked; specificity was slightly lower for self-report—among those who prospectively reported no smoking, 22.7% reported retrospectively that they had smoked during that pregnancy. This low specificity is improved (although sensitivity declined) when comparing recall of smoking in pregnancy after women knew they were pregnant with prospective measures. Kappa statistics showed excellent agreement for all assessments of the reliability of any smoking.
Table 2 shows the prevalence of recalled smoking for different time periods of pregnancy versus prospective self-report and cotinine measures for different time periods. Three aspects of these distributions are notable. First, 12% of women recalled smoking at some time during pregnancy but were never identified as smokers prospectively, either by self-report or by cotinine (retrospective-only smokers). A large majority of these women recalled smoking only before knowing they were pregnant or only in the first trimester. As prospective measures were made at prenatal care visits and relatively few visits were first-trimester visits, there were no prospective possibilities for picking up women who quit as soon as they learned they were pregnant and few opportunities for detecting women who smoked only in early pregnancy. For this group of women, retrospective report of smoking may be capturing true smoking which no practical prospective measurement could capture. Second, 9% of women recalled either never having smoked or only before they knew they were pregnant but had been prospectively identified as smokers either by self-report or by cotinine at one or more prenatal visits (prospective-only smokers). For this group of women, retrospective report is clearly inferior to prospective measurement, particularly prospective measurement by cotinine. Also notable is that the congruence between retrospective and prospective measurement for women who do recall smoking is much lower for the first trimester of pregnancy than for the second and third.
Table 3 compares average cigarette consumption per day for the subset of 87 congruent smokers for whom there was a retrospective report of smoking during pregnancy and prospective evidence of smoking by self-report. Retrospectively, 25 women (28.7%) reported very heavy smoking (more than a pack per day), whereas prospectively, only 15 (17.2%) were classified as very heavy smokers. Recall of light and moderate smoking agrees better with prospective categorization of smoking intensity than recall of heavy smoking. More than 60% of women who recall light or moderate smoking would have been similarly classified prospectively, and only 4 of these 62 women seem to be underreporting heavy smoking. In contrast, less than half the women who recall heavy smoking would have been prospectively classified as heavy smokers; almost half of them (56%) would have been classified as smoking less than, instead of more than, a pack per day. The mean difference between prospective and retrospective reported cigarettes per day was −0.523, and the 95% CI for agreement was −2.14 to 1.10 cigarettes/day (Supplementary Figure 2).
The correlation between recalled average cigarettes per day versus prospective average number of cigarettes per day across pregnancy was moderate (r = .58, p < .001). Correlations of recalled versus prospective smoking intensity were somewhat stronger for the second (r = .53, p < .001) and third (r = .57, p < .001) trimester than for the first (r = .41, p = .11).
Of 23 maternal, family, neighborhood, or child behavior characteristics that we examined, 5 were significantly different for both prospective-only and retrospective-only smokers in relation to nonsmokers, and a further 4 were significantly different between retrospective-only smokers and nonsmokers (Table 4). Compared with nonsmokers, both prospective-only and retrospective-only smokers were significantly more likely to be receiving public benefits (odds ratio [OR]=4.6, 95% CI = 1.7–12.1, p < .01 and OR=3.4, 95% CI = 1.4–8.4, p < .01, respectively) and less likely to own their home (OR=0.4, 95% CI = 0.1–0.9, p = .04 and OR=0.1, 95% CI = 0.05–0.4, p < .01, respectively). They were more likely to have low self-rated health (OR=4.5, 95% CI = 1.4–14.8, p = .01 and OR=3.9, 95% CI = 1.3–11.8, p = .02, respectively), less likely to be married (OR=0.4, 95% CI = 0.2–1.0, p = .06 and OR=0.5, 95% CI = 0.2–1.0, p = .06, respectively), and scored their neighborhood higher on the violence scale (7.6 vs. 6.2, p = .03 and 7.3 vs. 6.2, p = .07, respectively).
Additionally, retrospective-only smokers were more likely than nonsmokers to have low emotional health (OR=2.7, 95% CI = 1.0–7.3, p = .05) and to score lower on measures of parental well-being (42.9 vs. 46.7, p = .08), perceived stress (10.7 vs. 8.8, p < .01), and lifetime stressors (16.5 vs. 14.7, p = .04).
Neither group of misclassified smokers were significantly more likely to have children with Conduct Disorder symptoms or to have higher antisocial behavior scores than nonsmokers.
Findings here indicate that women's ability to recall their smoking behavior in pregnancy more than a decade after the event is generally both accurate and reliable, particularly for the second and third trimester of pregnancy. The relative accuracy of retrospective reporting here is particularly striking in light of the detailed exposure measurement on which the prospective assessments were based.
Our results indicate that retrospective and prospective measures are less congruent in two instances: characterization of smoking during the first trimester and classification of women as very heavy smokers (pack per day or more). In both these instances, it was the retrospective assessment that was more informative than the prospective measures. The fact that more first-trimester smoking was identified retrospectively highlights the fact that timing of prospective assessments (e.g., they typically do not begin until well into the first trimester) may miss a key period of smoking for the substantial number of women who quit spontaneously upon learning they are pregnant and suggests that, even in prospective studies, queries about smoking patterns prior to pregnancy knowledge will be useful (i.e., “retrospective reports” of the few months prior). The fact that higher numbers of heavier smokers were identified retrospectively highlights the fact that underreporting may be substantial among pregnancy smokers. Based on calibration in relation to cotinine levels, a substantial number of smokers in this sample required adjustment in smoking intensity due to underreporting (Wakschlag et al., 2009). This pressure to underreport may be relaxed retrospectively.
Previously, researchers have shown that maternal recall of smoking in pregnancy agrees well with smoking being recorded in the medical record, over recall periods of 4–6 (Githens et al., 1993), 10–15 (Yawn et al., 1998), and 30+ years (Tomeo et al., 1999). Medical records may consist only of self-report of smoking, rather than any biomarker confirmation, and women who are more or less willing to disclose smoking prospectively in clinical settings may have similar tendencies toward disclosing smoking when asked to recall for a research study. However, using repeated prospective self-report and biological measures of smoking in pregnancy, we also find that most women are accurate and reliable reporters at an average of 14.5 years after the pregnancy.
Our prospective sample was recruited between 1986 and 1992. In the United States in 1990, the overall prevalence of smoking in pregnancy was 21%, but this differed greatly by region, ethnicity, and socioeconomic status. The population base for our study was chosen on the basis of high rates of smoking among White, working-class women (in our sample, 46% of women are classed as smokers by cotinine and 42% by prospective self-report). It is possible that, among this group, willingness to disclose smoking, either prospectively or retrospectively, may differ from that among women of higher social class or from groups with different social norms around smoking in pregnancy. However, this seems unlikely to affect the patterns of concordance which were of primary interest in this study.
Our results confirm what Heath et al. (2003) reported using an innovative design: Self-report may be the best way to capture smoking in early pregnancy. In their study, Heath et al. asked women with twin sisters to recall smoking behavior during a particular pregnancy and then asked the twin sister to independently recall the women's smoking behavior. In general, the sisters’ reports were in agreement, except for women who smoked early in pregnancy and then quit—the twin sisters underreported this behavior.
It is relevant, however, that women who smoke during pregnancy are different, across multiple dimensions, from women who quit and women who never smoke, before or during pregnancy. Compared with women who smoke throughout pregnancy, women who do not smoke or who manage to quit smoking are more likely to be older, employed, and better educated (Agrawal et al., 2008; Graham & Der, 1999; Hanna, Faden, & Dufour, 1994; Martin et al., 2008). They are less likely to be depressed (Hanna et al., 1994; Pritchard, 1994) or highly stressed (Paarlberg et al., 1999), to be a single or cohabiting parent (Kiernan & Pickett, 2006; Thue, Schei, & Jacobsen, 1995), to have a smoking partner (Appleton & Pharoah, 1998; Wakefield, Gillies, Graham, Madeley, & Symonds, 1993) or an abusive partner (McFarlane, Parker, & Soeken, 1996), to have low social support (Dejin-Karlsson et al., 1996), or to live in a working-class neighborhood (Pickett et al., 2002; Sellstrom, Arnoldsson, Bremberg, & Hjern, 2008). Three studies within the Family Health and Development Project have shown that women who quit and women who continue to smoke are systematically different across multiple domains of psychosocial problems, including interpersonal problems, adaptive functioning, and other health-risk behaviors (Wakschlag et al., 2003), as well as conduct problems (Kodl & Wakschlag, 2004), and problematic psychosocial context also had incremental utility for predicting smoking intensity (Weaver, Campbell, Mermelstein, & Wakschlag, 2008). These differences have been confirmed within the population-based U.K. Millennium Cohort Study (Pickett, Wilkinson, & Wakschlag, 2009). We are unaware of previous studies that have examined such characteristics in relation to recall of smoking in pregnancy.
Reassuringly, we found that on the majority of measures of maternal, family, or neighborhood context, women whose smoking status would be misclassified in prospective or retrospective studies did not differ significantly from the nonsmokers they would be grouped with. However, misclassified smokers did differ in some aspects, including sociodemographic characteristics, maternal health, and family and neighborhood context. In epidemiological studies of the impact of smoking in pregnancy on behavioral problems in offspring, misclassification may not lead to a biased estimate of the effect of smoking, as misclassified smokers had children with behavioral characteristics more similar to nonsmokers than to correctly classified smokers. However, we note that any residual bias would lead to slight underestimates of any effect; studies with significant numbers of misclassified smokers are likely to produce conservative estimates. Although our study had limited power to detect such differences, comparison of actual differences, as well as p values, supports this interpretation.
One study has reported agreement between maternal recall of smoking in pregnancy and the medical record in relation to child behavior problems (Rice et al., 2007). There was greater incongruence between reports among more educated women, but no evidence that child behavior problems affected congruence. Our findings confirm this with the added reassurance of prospective biological measures of smoking. In fact, our findings imply that retrospective measures may misclassify fewer smokers than prospective studies, which miss early quitters. One caveat, however, is that retrospective reports in the present study were interview based and queried trimester-by-trimester behavior (rather than being a single questionnaire item on “typical day” smoking). That is, all retrospective reports are not alike, and future examinations of this question should systematically test the optimal way for retrospective reports to be structured to elicit accurate recall. In addition, it should be noted that women in the present study had been participating in a “smoking study” for nearly two decades, including having provided detailed reports and biological samples repeatedly. Thus, they may have been unusually sensitized to remembering their previous smoking behavior.
In a study of the effects of misclassification of smoking during pregnancy, England et al. found that 21.6% of self-reported quitters had cotinine measures that were positive for smoking, and this was most common among women who reported quitting before pregnancy. Misclassification had modest effects on estimates of the impact of smoking. Nevertheless, this study shows that prospective biological assays of smoking are informative beyond prospective self-report, and we have shown in this paper that retrospective report is informative above and beyond both prospective self-report and biological assays. This suggests that optimal measurement of smoking in pregnancy will combine all three. Repeated prospective self-report and cotinine measures, in combination with detailed timeline followback measures of smoking before pregnancy, after conception but before pregnancy is known, in the early weeks of pregnancy, and for periods between prospective measures, would provide the most complete coverage of women's smoking behavior over the course of pregnancy. Dukic, Niessner, Benowitz, Hans, and Wakschlag (2007) have proposed methods for combining prospective measures of self-report and cotinine—combining these with timeline followback measures may be the best way to characterize patterns of smoking in pregnancy. Although retrospective reporting has unique sources of error (such as recall bias), the novelty of the present findings is the implication that retrospective recall should not be treated as a flawed, last resort. Rather, these findings suggest that retrospective recall adds a unique source of information, perhaps due to reductions in the social sanctions that contribute to prospective nondisclosure and the opportunity it presents to report fully on the course of smoking across the pregnancy. Thus, we conclude that the optimal method for reducing risk of misclassification of pregnancy smoking status would combine prospective self-reported and biological measures as well as retrospective assessments.
The EBFS was supported by a grant from the National Institute on Drug Abuse to Dr. Wakschlag (R01DA15223). Dr. Wakschlag was also supported by the Walden and Jean Young Shaw and Children's Brain Research Foundations. Dr. Pickett is supported by a U.K. National Institute of Health Research Career Scientist Award.