Search tips
Search criteria 


Logo of nictobLink to Publisher's site
Nicotine Tob Res. 2010 January; 12(1): 73–76.
Published online 2009 December 2. doi:  10.1093/ntr/ntp168
PMCID: PMC2802570

Data to assess the generalizability of samples from studies of adult smokers



One major determinant of external validity is the representativeness of the sample. This article provides data to help authors and readers assess the generalizability of samples from smoking studies


We analyzed the 2007 U.S. National Health Interview Survey.


We provide means, SEMs, and 95% CIs for demographic and smoking behavior characteristics of never-smokers, ever-smokers, all current smokers, current daily smokers, current nondaily smokers, long-term ex-smokers, and smokers who made a quit attempt in the last year.


Our results can help studies assess generalizability, set targets for recruitment, or reweigh data to reflect U.S. averages.


External validity is crucial to interpreting the outcomes of clinical studies. One major aspect of external validity is the representativeness of the sample. Few smoking studies use samples that are truly population-based. One method to assess external validity (aka generalizability) of predefined or convenience samples is to compare the characteristics of enrollees versus nonenrollees (Graham et al., 2008). However, this requires collecting data from those who do not wish to participate, which can be problematic. Another method is to compare the sample characteristics with those of a population-based sample. Clinical trials that have done this have often found their samples were surprisingly similar to the population-based sample (Hughes, Solomon, Livingston, Callas, & Peters, 2009). In a prior article (Hughes, 2004), we provided demographic and smoking characteristics of U.S. smokers based on the 2000 U.S. National Health Interview Survey (NHIS) to allow studies to compare their sample with the U.S. average smoker. Because these data are 9 years old, we now provide more recent data.


Among the population-based samples of smokers in the United States, the NHIS (, collected in 2007, and the Current Population Survey-Tobacco Use Supplement (CPS-TUS) (, collected in May 2006, August 2006, and January 2007, provide the most recent information on smoking behavior. The CPS-TUS uses a larger sample size and asks more questions; however, the NHIS has a large sample size (23,393), and data from the extra questions on the CPS-TUS are rarely reported in clinical trials and laboratory studies. The CPS-TUS is conducted once-per-4 years, whereas the NHIS is published yearly. Although the sampling strategies, time period, question wordings, etc., differ between the two, we see no clear reason to state that one survey is more valid than the other. We completed analyses on both the CPS-TUS and the NHIS but only report on the NHIS data in this publication because our prior publication reported on NHIS and, thus, we can comment on whether the characteristics of smokers appears to have changed over time and thereby provide a test of the hardening hypothesis (Warner & Burns, 2003). Results for the CPS-TUS were almost identical and are posted at

The 2007 NHIS Sample Adult questionnaire was administered to a nationally representative sample of the U.S. noninstitutionalized civilian population aged at least 18 years. The overall response rate for the survey was 78% among adults identified as eligible and 68% accounting for household and family non-response ( (National Center for Health Statistics, 2008). The current analysis examines only demographic data and cigarette smoking and not other tobacco use. A multistage sampling design was used to obtain a representative sample. Means and 95% CIs were computed using appropriate weighting within the PROC SURVEYMEANS procedure of SAS 9.1 (SAS Institute Inc., 2003, Cary, NC).

Participants were asked, “Have you smoked at least 100 cigarettes in your entire life? Do you now smoke cigarettes every day, some days, or not at all? During the past 12 months, have you stopped smoking for more than one day because you were trying to quit smoking? How long has it been since you quit smoking cigarettes?”

Based on these questions, we describe the characteristics of never-smokers (smoked <100 cigarettes in lifetime), ever-smokers (smoked ≥100 cigarettes/lifetime), all current smokers, current daily smokers (smoke “everyday”), current nondaily smokers (smoke “some days”), long-term ex-smokers (quit >1 year ago), and current smokers who made a quit attempt in last year (includes current daily and nondaily smokers). We examined age, sex, ethnicity, and education as well as cigarettes per day and age at onset of regular smoking. Cigarettes per day for daily smokers was from the question “On the average, how many cigarettes do you now smoke a day?” For nondaily smokers, it was obtained from the product of “On how many of the past 30 days did you smoke a cigarette” and “On the average, when you smoked during the past 30 days, about how many cigarettes did you smoke a day?” The product was divided by 30 to give the daily average over the prior 30 days. Age at onset was from the question “How old were you when you first started to smoke fairly regularly?” We did not undertake formal statistical comparisons across the multiple groups because this was not the purpose of the article.


The prevalence of smoking subgroups, demographic, and smoking characteristics is consistent with prior publications from the 2007 NHIS ( and from other recent U.S. population-based samples of smokers such as the National Health and Nutrition Survey (, the Behavioral Risk Factor Survey (, and the CPS-TUS ( By comparing demographic and smoking characteristics of the study of interest with those in Table 1, the author or reader can estimate the generalizability of the study sample. The large sample size of the NHIS results in small 95% CIs (usually ± 1%–4%). If a study of smoking has a large sample size (e.g., greater than 200), it also would have a small 95% CI. As a result, small differences between the smoking study and the NHIS could be statistically significant but not clinically meaningful. On the other hand, many studies of cigarette smoking have small sample sizes (<50) and result in large 95% CIs. As a result, even a large difference in a characteristic of a study versus the NHIS data will not statistically differ. Thus, we believe that authors and readers should focus on the magnitude of any difference in a given characteristic rather statistical significance. We would encourage those who wish to assess more accurately the external validity of their sample to directly access the NHIS or CPS-TUS datasets. For example, if a study selected a sample of women smokers who smoked >10 cigarettes/day, the author could use the NHIS to find the average age, education, etc., of such smokers to compare with the study sample. Finally, contrary to the hardening hypothesis, cigarettes per day among current smokers decreased from the 2000 NHIS (mean = 17.7; 95% CI = 16.7–17.5) to the 2007 NIHS (13.4; 13.0–13.8)

Table 1.
Mean, SEM, and 95% CI for demographics and smoking behavior of U.S. adult smokers from the 2007 National Health Interview Surveya


The analysis has some liabilities. First, the NHIS collects only a few smoking behaviors. The CPS-TUS has more questions on tobacco use than the NHIS or other national surveys; however, the NHIS provides information on the sample characteristics reported by most tobacco studies. In addition, the NHIS is completed yearly which, given secular trends, allows comparisons with more recent surveys. Finally, as stated above, our analysis of CPS-TUS found similar results to those using the NHIS.

Second, with increasing stigmatization of smoking, increasing refusal rates, etc., population-based surveys of smoking may be becoming less reliable (Hartge, 2006; West, Zatonski, Przewozniak, & Jarvis, 2007). Such disclosure bias would be greater among household interviews, such as the NHIS, than in phone or Internet interviews (Aquilino, 1992). Third, our results are confined to U.S. adult smokers. Values for other nationalities and adolescents will differ. Fourth, demographic and smoking behavior may not be the most important biases influencing external validity; for example, medical and psychiatric comorbidity, nicotine dependence, and generic volunteer bias variables, such as severity of illness, may be more important (Amori & Lenox, 1989; Graham et al., 2008). Fifth, we did not include other possible categories such as light smokers, recent quitters, smokers who use other forms of tobacco or nicotine, or adolescent smokers, but these could be derived from this dataset. Sixth, NHIS only counts quit attempts that last more than 1 day. Although this may produce bias by excluding the less motivated or more dependent smokers, our previous study suggests that this may not be the case (Carpenter & Hughes, 2004).

Major assets of the NHIS are its large sample size and recency. Also, its use of household interviews avoids the possible bias of phone-based interviews due to the increasing use of cell phones, no-call lists, message screening, etc. (Hartge, 2006). We encourage researchers to use our data to determine generalizability during and after a study or to reweigh existing data to mimic those of a U.S. population of interest.


This work was funded by Senior Scientist Award (DA-000490) and research grants (DA-011557, DA-017825, and DA025089) from the U.S. National Institute on Drug Abuse.

Declaration of Interests

JRH is currently employed by The University of Vermont and Fletcher Allen Health Care. Since 2006, he has received research grants from the National Institutes on Health, Pfizer, and Sanofi-Aventis. During this time, he has accepted honoraria or consulting fees from Abbot Pharmaceuticals; Academy for Educational Development; Aradigm; American Academy of Addiction Psychiatry, American Psychiatric Association, Atrium, Celtic Pharmaceuticals: Cline, Davis & Mann; Constella Group; Cowen Inc.; Cygnus; Dean Foundation, DLA Piper, Edelman PR; EPI-Q, European Respiratory Society, Evotec; Exchange Limited; Fagerstrom Consulting; Free and Clear; Health Learning Systems; Healthwise; Insyght; Infomed, Invivodata; Johns Hopkins University; J L Reckner; LEK Associates, Maine Medical Center; McNeil Pharmaceuticals; Nabi Pharmaceuticals; Novartis Pharmaceuticals; Oglivy Health PR, Ottawa Heart Institute, Pfizer Pharmaceuticals; Pinney Associates; Reuters; Scientia, Shire Health London; Temple University of Health Sciences; University of Arkansas; University of Cantabria; University of Kentucky; University of Madrid, University of Wisconsin, U.S. National Institutes on Health; Xenova and ZS Associates. PWC has no disclosures.

Supplementary Material

[Article Summary]


We thank Angela Trosclair for help with NHIS and Anne Hartman and Todd Gibson for help with CPS-TUS.


  • Amori G, Lenox RH. Do volunteer subjects bias clinical trials? Journal of Clinical Psychopharmacology. 1989;9:321–327. [PubMed]
  • Aquilino WS. Telephone versus face-to-face interviewing for household drug use surveys. International Journal of the Addictions. 1992;27:71–91. [PubMed]
  • Carpenter MJ, Hughes JR. Defining quit attempts: What difference does a day make? Addiction. 2004;100:257–259. [PubMed]
  • Graham A, Papandonatos G, DePue J, Pinto B, Borrelli B, Neighbors C, et al. Lifetime characteristics of participants and non-participants in a smoking cessation trial: Implications for external validity and public health impact. Annals of Behavioral Medicine. 2008;35:295–307. [PubMed]
  • Hartge P. Participation in population studies. Epidemiology. 2006;17:252–254. [PubMed]
  • Hughes JR. Data to estimate similarity of tobacco research samples to intended populations. Nicotine & Tobacco Research. 2004;6:177–179. [PubMed]
  • Hughes J, Solomon L, Livingston A, Callas P, Peters E. A randomized, controlled trial of gradual cessation (aided by NRT) vs. abrupt cessation of smoking. Addiction. 2009 in press.
  • National Center for Health Statistics. Data file documentation, National Health Interview Survey, 2007. Hyattsville, Maryland: National Center for Health Statistics, Centers for Disease Control and Prevention; 2008.
  • Warner KM, Burns DM. Hardening and the hard-core smoker: Concepts, evidence, and implications. Nicotine & Tobacco Research. 2003;5:37–48. [PubMed]
  • West R, Zatonski W, Przewozniak K, Jarvis M. Can we trust national smoking prevalence figures? Discrepancies between biochemically assessed and self-reported smoking rates in three countries. Cancer Epidemiology Biomarkers & Prevention. 2007;16:820–822. [PubMed]

Articles from Nicotine & Tobacco Research are provided here courtesy of Oxford University Press