Despite years of epidemiological research on tobacco cigarette smoking, smoking remains the single most preventable cause of death worldwide, resulting in more than 5 million deaths per annum (World Health Organization, 2008
). According to revised estimates from the 1990 Global Burden of Disease (GBD) project, by 2015, more individuals will die from tobacco-related diseases than from HIV/AIDS (Mathers & Loncar, 2006
). In the United States, approximately 440,000 deaths per year, including approximately 82% of lung cancer deaths, are attributable to smoking (Center for Disease Control Prevention, 2008
). The GBD projections do not count indirect contributions of tobacco to patterns of unhealthy behavior that, in turn, are linked to significant mortality and morbidity (Ezzati et al., 2002
). For example, prior use of tobacco has been linked to later more serious drug use involvement. Smokers are more likely to have the opportunity to try cannabis compared with nonusers (Wagner & Anthony, 2002
). Tobacco is one of several drug compounds linked to excess risk of cannabis dependence after the first cannabis experience (Chen, O’Brien, & Anthony, 2005
It is the early count process that is under study in this project, with a focus on the first months and year, after onset of smoking. For the most part, tobacco-attributable diseases and unhealthy behavior can be traced back to the formation of tobacco or nicotine dependence syndromes, which sustain or propel a count process of accumulating “pack-years” of cigarette smoking. Nonetheless, in the earliest stages of smoking, there is no dependence, and the reverse is true. That is, from the first cigarette onward, it is a count process that sustains or drives the formation of the dependence syndrome, as with other psychoactive drugs (e.g., see Anthony, 2010
; Koob & Le Moal, 1997
). In this report, we estimate components of this count process using a zero-inflated Poisson (ZIP) model, in a novel application of the ZIP model to the early stages of smoking involvement. It is our intent to initiate this line of research with nationally representative survey sample data on newly incident tobacco smokers who have started to smoke within 24 months of the date of cross-sectional assessment and to focus the inquiry on an observation interval defined by the number of smoking days in the 30 days prior to assessment.
Readers not familiar with the ZIP model may require clarification of what is meant by the term “count process.” Here, the first component involves whether smoking will be persistent once it starts. For example, some who smoke a cigarette never try another cigarette, ever. For other newly incident smokers, the first smoking experience serves a reinforcing function such that the probability of smoking the next cigarette is very high and the elapsed time from first to second cigarette is quite short. During any specified weekly or monthly interval of time after onset of smoking, nonpersistent smoking can be measured by the number of days of smoking during that interval, but the newly incident smokers with zero days of smoking during that interval can be sorted as members of two latent classes: (a) a latent class of newly incident smokers who never will smoke again and (b) a latent class of newly incident smokers who will persist in smoking, despite having not smoked in the interval under observation. In epidemiological research on subgroups of the population, we can use the ZIP model to estimate the mean probability or odds of being a member of these two classes, and subgroup variation in the odds is characterized by the “persistence parameter (PP)” of the ZIP model. The existence of the second latent class is posited on the basis of the excess number of zeroes in the count distribution for days of smoking in the interval under observation.
The second component of the count process involves the rate of smoking during the interval of observation, conditional on membership in the latent class of newly incident smokers who will persist in smoking, despite not having smoked in the interval under observation. Here, we can use the ZIP model to estimate subgroup variation in the rate of smoking during the interval of observation and form rate ratios (RR).
Taken together, these two components characterize the count process of very–early stage smoking involvement, which can be studied in complement with the probability of becoming tobacco or nicotine dependent, as explained recently by Anthony (2010)
. Indeed, in the earliest stages of smoking involvement, the emergence of a dependence syndrome will vary as a function of the count process components; its probability will be driven upward by these components of the count process. If a newly incident smoker tries one cigarette and never smokes again, dependence will not emerge. It is in this sense that the probability of dependence is influenced by the count process. Nonetheless, as dependence syndromes begin to form, there is a feedback loop and the development of the dependence syndrome begins to drive the count process (Rose, Dierker, & Donny, 2010
). The feedback loop represents a violation of standard multiple regression models and the study of dose–response relationships in toxicology and pharmacological research. For this reason, it is a mistake to plot the probability, prevalence, or odds of being tobacco dependent as a function of the elements of the count process (i.e., a mistake in the form of a model misspecification). Also, it would be a mistake to plot the count process estimates as a function of whether a smoker had developed tobacco or nicotine dependence. Instead, if we are to examine these outcomes of newly incident smoking, it will be necessary to have longitudinal data, modeled in a multivariate fashion that includes the feedback loops.
With respect to tobacco smoking in general and nicotine dependence in particular, previously conducted large-sample epidemiological studies have routinely examined the sociocultural milieu of past drug encounters—the so-called “quality and quantity” of tobacco-related experiences (Brook, Saar, Zhang, & Brook, 2009
). Here, we hypothesize that once smoking starts, persistence in tobacco involvement will vary across ethnic groups, as might the rates of smoking. As such, this study extends our group’s previous work on the earliest stages of tobacco dependence (Storr, Reboussin, & Anthony, 2004
In these initial models, we posit a small set of subgroup variations, with subgroups defined in terms of time-fixed, or relatively time-invariant variables (e.g., sex, race/ethnicity), that will need to be taken into account as potential confounders when the epidemiological research shifts toward time-varying characteristics (e.g., years of schooling, socioeconomic position). When subgroup variation is observed for either count process parameter (in relation to the potentially confounding variables), we also explore whether this variation might be attributed to imbalances in the calendar year of first cigarette smoking and the age/age-of-onset nexus. We note that adolescent-onset smoking has been associated with higher lifetime prevalence of sustained smoking and nicotine dependence, but as noted by Breslau and Peterson (1996)
, this relationship appears to be a function of elapsed time from smoking onset to the date of assessment. Here, we constrain elapsed time to be a value less than 24 months such that the expectation is for no excess risk for adolescent-onset smokers.
A few words about this age/age-of-onset nexus may be in order. Because we have restricted the sample to recent-onset smokers, the age of the smoker on the date of survey assessment is tightly correlated with the age-of-onset for smoking (i.e., correlation >.9). As such, multicollinearity thwarts concurrent regression modeling of both variables. This is what we mean by the age/age-of-onset nexus in this context.