Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Int J Cancer. Author manuscript; available in PMC 2009 September 23.
Published in final edited form as:
PMCID: PMC2749693

Estimation of the effects of smoking and DNA repair capacity on coefficients of a carcinogenesis model for lung cancer


Numerous prospective and retrospective studies have clearly demonstrated a dose-related increased lung cancer risk associated with cigarette smoking, with evidence also for a genetic component to risk. In this study, using the two-stage clonal expansion stochastic model framework, for the first time we investigated the roles of both genetic susceptibility and smoking history in the initiation, clonal expansion, and malignant transformation processes in lung carcinogenesis, integrating information collected by a case–control study and a large-scale prospective cohort study. Our results show that individuals with suboptimal DNA repair capacity have enhanced transition rates of key events in carcinogenesis.

Keywords: TSCE model, lung cancer, genetic susceptibility, cigarette smoking

The process of carcinogenesis includes multiple steps. Exposure to polycyclic aromatic hydrocarbons (PAHs) initiates a genetically controlled activation process. Activated PAHs, one of the most potent of which is benzo[a]pyrene diol epoxide (BPDE), can damage DNA by forming bulky adducts. These adducts, if not repaired, may cause mutations that predispose to lung cancer.1 Although early lung cancer researchers24 focused mainly on the risk from cigarette smoking, more recent studies have shown the role of genetic susceptibility in lung cancer risk in case–control analyses.1,57

In this article, we propose fitting a stochastic carcinogenesis model, the two-stage clonal expansion (TSCE) model, to 2 data-sets (case–control and prospective), and test hypotheses regarding the impact of cigarette smoking, as well as DNA repair capacity (DRC), on lung cancer development. In the TSCE model, normal proliferating stem cells can become initiated by a carcinogen or by a spontaneous random mutation, which allows them to proliferate at a higher rate (promotion). Initiated cells, in turn, can undergo the second-stage event called malignant transformation. In this analysis, we compared the initiation, promotion, and malignant transformation rates for women and men and for never smokers and smokers.

A major challenge in attempting to model the interaction of smoking and genetic factors in lung carcinogenesis is the difference in study designs. Genetic susceptibility has usually been evaluated in a case–control setting, whereas published nationwide lung cancer incidence data and absolute risk estimates stratified by smoking status and smoking intensity have been obtained through prospective cohort studies. The current study is an attempt to combine the evidence provided by these 2 sources of data. The unifying framework is the modified TSCE model, in which the major parameters depend on smoking history and genetic susceptibility. We define an objective function, which includes terms depending on case–control and prospective data, and minimize this function by manipulating its parameters. A similar problem using case–control data was considered previously by a German group.8 In their study, unconditional and conditional likelihood functions were constructed based on a case–control study involving nearly ten thousand subjects. However, to balance the contribution of both case–control and prospective datasets, we chose to use least squares estimation (LSE) here and minimized the deviation of observed cases from model-predicted cases for both datasets. In this article, we also, for the first time, considered 2 modes of lung carcinogenesis, characterized by different baseline rates for never and current smokers.

Our data analyses have confirmed the earlier findings that the major risk from cigarette smoking arises from its effect on promotion—by enhancing the cell division rate. In addition, the data suggest that people with poor DRC have increased initiation rates and thus are more susceptible to lung cancer.

Data and methods

Data description

CPS-II study

The cancer prevention study (CPS) II provides 1 of the few large scale cohort datasets with age-specific lung cancer death rates for never and current smokers. It includes more than one million United States residents age 30 years or older with 6 years of follow-up (1982–1988). Our analysis only used their summary tables containing age-specific (5 year bin) mortality rates from lung cancer excluding prevalent cancers, for never smokers and current smokers stratified by smoking intensity and smoking duration. A model for former smokers’ carcinogenesis was not considered here because their age-specific mortality rates were not available in CPS-II due to large heterogeneity with respect to initiation and cessation ages. For detailed information on CPS-II, please refer to the report released by American Cancer Society.9

M. D. Anderson lung cancer case–control study

Subject recruitment

Patients with lung cancer were a subset of patients who have been accrued in an ongoing and previously described molecular epidemiological study of susceptibility markers for lung cancer from The University of Texas M. D. Anderson Cancer Center. There were no age, gender, ethnic, or stage restrictions. In this study, we used data on 600 Caucasian patients recruited from September 1995 through December 2003 with histologically confirmed lung cancer, self-reported as either current smokers or lifetime never smokers (these latter defined as those who had smoked less than 100 cigarettes in their lifetime). The response rate for cases was about 80%. The reasons for refusal to participate included: patients too ill, referred only for second opinion to M. D. Anderson Cancer Center, or unwilling to donate blood for the study and complete the interview. Healthy Caucasian controls (n = 600) (see also Table I) who were one-to-one matched to cases on age (±5 years) and smoking status (but not on smoking intensity and duration) were individuals without a previous diagnosis of cancer (except for nonmelanoma skin cancer) recruited from the Kelsey-Seybold Clinics, Houston’s largest private multispecialty physician group that includes a network of 23 clinics and more than 300 physicians in the Houston metropolitan area. Patients arriving at the clinics were given a short survey form to determine their eligibility for the study and willingness to participate. The completion of the form was strictly voluntary. Based on the survey forms, individuals most suitable for frequency matching to the recruited cases were identified and contacted to schedule the interview and specimen collection. The response rate for the controls was about 75% when approached for an interview. The main reasons for declining participation included lack of time or difficulties related to transportation. All cases and controls were U.S. residents. This research was approved by the M. D. Anderson Cancer Center and Kelsey-Seybold Institutional Review Boards.


Collection of epidemiological data

After the study participants were briefed on the study and signed informed consent, a 45-min structured personal interview was conducted by M. D. Anderson research interviewers, during which they obtained information on socio-demographic characteristics, smoking and secondhand smoke exposure, history of other exposures and certain respiratory conditions, and history of cancer in first degree relatives.

Assessment of DRC

Suboptimal DRC has been demonstrated to be associated with increased lung cancer risk, in both smokers and nonsmokers.10,11 DRC was measured in cultured peripheral lymphocytes using the host-cell reactivation assay with a reporter gene damaged by an activated tobacco carcinogen, BPDE. Details of the assay have been reported previously.12 Briefly, the assay uses a BPDE-damaged nonreplicating recombinant plasmid (pCMVcat) harboring a chloramphenicol acetyltransferase (CAT) reporter gene transfected into T-lymphocytes. Because even a single unrepaired BPDE DNA adduct blocks CAT transcription, any measurable CAT activity will reflect the ability of the transfected cells to remove BPDE-induced DNA adducts from the plasmids. In the assay, the cells are stimulated by phytohemagglutinnin so that they can uptake the plasmids. Duplicate transfections with either untreated plasmids or BPDE-treated plasmids are always performed. CAT activity is assayed by adding chloramphenicol and [3H]acetyl-CoA, and measuring the production of [3H]monoacetylated and [3H]diacetylated chloramphenicols with a scintillation counter. DRC is reported as the ratio of the radioactivity of cells transfected with treated plasmids to the radioactivity of cells transfected with untreated plasmids. Assuming that the transfection efficiencies of BPDE-treated and untreated plasmids are equal, this ratio reflects the percentage of damaged CAT reporter genes repaired in test lymphocytes transfected with BPDE-treated plasmids. Previously it has been shown that sample storage time, CAT baseline expression, and blastogenic rate did not affect DRC.12

The percentages of missing DRC ranged from 7–30% within each subgroup stratified by gender, smoking status, and case–control status. DRC was categorized into 2 groups: individuals with DRC greater than the control median were assigned (arbitrarily) a value of 1 for the categorical DRC variable, and individuals with DRC below the control median—a value of 1/2.

Assigned values of smoking intensity

In the CPS-II study, smokers were grouped by smoking intensity (not more than 10 cigarettes per day (cpd), more than 10 cpd but not more than 15 cpd, and so forth, until more than 35 cpd). Because the lowest cpd group was very small, and for simplicity, we reduced the number of groups to two. The current smokers and recent quitters (those who quit for less than a year), smoking 15–25 cigarettes per day, are grouped together and treated as current smokers with smoking intensity of 20 cigarettes per day. Likewise, all current smokers and recent quitters smoking 30–40 cigarettes per day were assigned the intensity of 40 cigarettes per day.

The data concerning smoking histories at our disposal were somewhat limited: in both the CPS-II study and the MDACC case–control study, average number of cigarettes per day and smoking duration were recorded. Therefore, if further assumptions concerning time course of smoking intensity are to be avoided, the most parsimonious solution is to assume constant intensity over the smoking interval and zero intensity before the smoking initiation.

TSCE model with smoking and DRC effects

A two-stage clonal expansion model was first proposed by Knudson13 to model retinoblastoma in children. Subsequently, it was generalized by Moolgavkar,14 allowing cell death and differential growth of initiated cells. The TSCE carcinogenesis model has been frequently used to evaluate the effects of different factors in human cancers.8,1518 A schematic of the standard TSCE model is depicted in Figure 1. In this model, there are 3 types of cells: normal cells (NC), initiated cells (IC), and malignant cells (MC). The waiting time to NC initiation and the time for IC division, death or transformation follow exponential distributions with parameters ν(t), α(t), β(t), and μ(t), associated with each process, respectively. Each cell acts independently from the other cells. Once the first MC arises, it is assumed that it develops into a malignant tumor with probability one.19

Figure 1
Two-stage clonal expansion model paradigm. NC: normal cells; IC: intermediate initiated cells; MC: malignant cells.

The time from the first malignant cell to lung cancer death is usually relatively short and frequently is considered negligible. Therefore, in this analysis, the waiting time from birth to death (from lung cancer) is considered to be the same as the waiting time T from birth to the first MC.

In this analysis, we assumed that for the smokers, biological parameters were constant in subintervals of individual’s age corresponding to the reported smoking history (see the example in Figure 2), as it was suggested previously.1618 We used the expressions for the survival functions of T as derived by Heidenreich20 for a homogeneous population. The TSCE model is known to be nonidentifiable when fitted to incidence data.20 At most, 4k-1 parameters are estimable for the k-piecewise model. A practical solution is to assume that one of the four parameters is known. To determine the value of any biological parameter is not trivial in the absence of gold standard, so the value of the death rate was fixed under the principle that the magnitudes of our parameters are comparable to those in other studies,8,17 while assuming the female and male background birth and death rates are similar.

Figure 2
Age-dependent second transition rate μ(t), as a function of the current smoking status.

Parameter estimation method

In contrast to most TSCE application papers, we chose LSE for parameter estimation. We selected the parameter combination yielding the minimum value of the objective function defined as the distance between the predicted probability of getting lung cancer and the observed frequency. The objective function included 2 components. The first one, f1, was a chi-square statistic comparing model-predicted and CPS-II-observed death counts. The second one, f2, was a chi-square statistic-type expression comparing model-predicted death counts in the optimal DRC group with the model-predicted death counts in the suboptimal DRC group, multiplied by the estimate of the relative risk of lung cancer in the optimal DRC group, based on the case–control data with certain assumptions specified below:


Ei = number of subjects in the ith age group at enrollment to CPS II ×P(ti < Tti + duration of the study (6 years)|T > ti)

ti: median age of the ith age group at enrollment to CPS II

The probability P (ti < Tti +6) | T > ti) concerns the CPS II population. In the computation to approximate this probability, a mix of optimal and suboptimal DRC groups with equal weight was used. Furthermore,

f2=((P(35T80[mid ]optimalDRC)P(35T80[mid ]suboptimalDRC)×R^21)×n)2P(35T80[mid ]optimalDRC)×n


R^21=P^2P^1=P^(35T80[mid ]optimalDRC)P^(35T80[mid ]suboptimalDRC)=P^(35T80;optimalDRC)/P(optimalDRC)P^(35T80;suboptimalDRC)/P(suboptimalDRC).

R21 is an estimate of the relative risk of developing lung cancer given optimal DRC when compared with suboptimal DRC, assuming equal frequencies of individuals with optimal and suboptimal DRC in the population. Among both never smokers and smokers, R21 is estimated as the ratio of the number of patients with optimal DRC to the number of patients with suboptimal DRC in the case–control study within the corresponding smoking status subgroup. This estimate is equivalent to the original definition of relative risk R21 under the assumption that the population median was used to split optimal from suboptimal DRC individuals. Indeed, then P[optimal DRC] = P[suboptimal DRC] = 1/2. In this study, the population DRC median has been approximated by the DRC median in controls. The age range corresponds to that in the case–control study.

In the definition of function f1, k is the number of age groups for a given smoking intensity (si) (20 or 40 cigarette/day), that is, the number of cells in the CPS-II data table. As mentioned above, we dichotomized si mainly for simplicity. This means that we do not consider a continuous dose-response relationship and therefore extending the logarithmic formula beyond the 2 discrete values might not generally produce meaningful results.

The parameter n was set through the following steps. First, it was determined by the magnitude of person-years in the CPS-II for never smokers. The person-year varied by age but on average for never smokers it was around 105. Second, the mortality rate of lung cancer among never smokers was almost one tenth of that for current smokers. Third, the over-representation of never smoker lung cancer patients in the case–control study (when compared with the general population) forced us to further downweigh the population size of the current smokers. Finally, a sample size of 103 yielded a good balance for the contribution of f1 and f2 to the objective function. The f2-term for current smokers’ data includes case counts for smokers with 20 and 40 cigarettes per day, respectively.

Equal weight was assigned to each age group except for people who smoked for longer than 50 years. Their death count was multiplied by 0.8 in order to downweigh their contribution to f1 due to larger uncertainty in these cell counts.

Design of response function

In the objective function, both terms, f1 and f2, depend on the distribution function of the waiting time T, which is determined by 4 biological parameters of the TSCE model and age. For details of how the survival function depends on the 4 parameters, please see the Appendix. The influence of DRC and cigarette smoking on lung cancer development was modeled through the response functions that defined their impact on the biological parameters. In never smokers, only DRC was assumed to affect the biological parameters.

Never smokers

We used the following response functions for never smokers, relating the biological parameters to DRC:


where DRC = 1 for optimal or 1/2 for suboptimal repair.


For smokers, we assumed 2 variants of linear response functions relating the biological parameters to DRC and smoking. In the first variant, we assumed that before initiation age, the parameters remained the same as those for never smokers. Additional terms accounting for smoking effects were added after the smoking initiation to all biological parameters including cell death rate. Such a setup implies that all pathways or key events in lung carcinogenesis are essentially the same for never smoker and current smoker patients. The response functions in the presence of smoking are expressed as follows:


where si denotes the smoking intensity, discretized to 2 levels (20 and 40 cigarettes per day, as explained earlier on). Here, the coefficients a1 and a4 correspond to the effects of DRC during nonsmoking and smoking phase, respectively. Coefficients a2 and a3 in the terms (1 + a2) and (1 + a3) are the effects of smoking on the first and second transition rates. The impact of smoking intensity on cell proliferation rate is a5. The coefficient a6 is a fit adjustment parameter.

In contrast, the second variant of response functions postulates a different lung carcinogenesis mechanism, in which lung cancer in current smokers goes through the transitions at different rates than lung cancer in never smokers (see the illustration in Figure 3). Path 2 refers to the lung cancer development among the majority of smokers, whereas path 1 represents that of never smokers. The corresponding response functions allow the background initiation and malignant transformation rates for lung cancer types in smokers and in never-smokers to be different. For current smokers, the initiation rate is v0(1 + a2) and the malignant transformation rate is μ0(1 + a3) before as well as after the smoking initiation. We realized that likely there was a change in initiation and malignant transformation rates after smoking starts, and ideally a separate parameter should be used to model this change. However, because of potential over-parameterization when applying the TSCE model, no extra parameter was added to this response functions. Mathematically, in the absence of cigarette smoking,

Figure 3
Two major paths of lung cancer carcinogenesis for never and current smokers, respectively.


where a1 is estimated from never smokers data, whereas in the presence of cigarette smoking,


To summarize, for the first variant of response functions, Eqs. (1) and (2) were used. Eqs. (3) and (4) were exclusively used for smokers in the second variant of response functions. In both variants, parameters ν0, μ0, α0, and a1 were estimated via never smokers’ data.

Two-step estimation

Our parameter estimation is a two-phase process; the parameters α0, ν0, and μ0, and the coefficient a1 were first estimated from never smokers’ data. These estimates were then substituted into the response function and the estimates of remaining coefficients were obtained by fitting to current smokers’ data. In each of iterations, the number of lung cancer cases within each age group was resampled according to a Poisson distribution and treated as the observed death counts. The optimization procedure was run for each resampled death counts data. Two-sided confidence intervals were then constructed based on the estimates from the simulations. Analyses for each gender were conducted separately.


In our study, we did not attempt to fit the never smoker and smoker data simultaneously, but instead we fitted them to separate models. In the Discussion, we explain why we considered doing so appropriate.

Fitting the model to never smokers data

The baseline death rate β0 was set at 1.4 and 1 for females and males, respectively, to yield fit to the data satisfying the constraint of μ0ν0 ~10−8 as in Heidenreich et al.8 while maintaining the magnitude of μ0 around 10−7. The population size n, as defined in Data and Methods, was assumed to be equal to 105. After substituting β0 and n the estimates for the baseline biological parameters ν0, μ0, α0, and a1 were obtained (Table II). The estimates of a1 for both genders were close and positive, indicating that suboptimal DRC increased the lung cancer risk through enhancement of initiation and malignant transformation rates in a similar way in never-smoking men and women.


Fitting the model to current smokers data

Data fitting for never smokers provides baseline parameters needed for current smoker data analysis. For current smokers, coefficient a4 represents the impact of DRC during the smoking period. This effect is important in the light of the hypothesis that exposure to higher doses of carcinogen might change the way DRC affects ν and μ. The estimates of relative risk R12 of developing lung cancer for individuals with suboptimal DRC, by smoking intensity and gender, are provided in Table I. It would have been desirable to consider the age effect in the relative risk estimation. However, due to missing values and the resultant small sample size, the estimates of relative risk would have had greater uncertainty if the sample were further split into several age groups. A future study with larger sample size will investigate this issue.

We conducted further analyses based on (i) gender-specific ratios and (ii) gender-combined ratios. The conclusions were almost the same; therefore we only report the fitting using the gender-combined ratios.

The CPS-II data include the age-specific mortality rates for sub-populations defined by smoking intensity (20 or 40 cigarettes per day) and smoking duration (30 to 34 years, 35 to 39 years, 40 to 44 years, 45 to 49 years, and more than 50 years). These data were used to approximate incidence rates among whites with given smoking intensity and durations. The exact initiation ages for current smokers were not provided in CPS-II. We reconstructed the initiation age by subtracting the median of the smoking durations from the median of participants’ age range (data not shown).

To determine how realistic our model was, we computed the predicted overall absolute risks of developing lung cancer by the age 75 for current smokers, denoted by P75. To accomplish this, estimates of the frequencies of ages at smoking initiation for current smokers (men and women) and distributions of smoking intensities for current smokers (men and women) were required. These were produced (for this purpose only) based on the output of the CISNET Smoking History Generator (CSHG; Using the information listed above, we obtained the parameter estimates and the predicted overall risk of lung cancer by age 75 as presented in Table III.


Because the second variant of response functions yielded a fit with a smaller objective function values than the first variant for either gender, we limited our presentation to the second variant of response functions. According to the report by Peto et al.,21 the overall risk of developing lung cancer by age of 75 is around 16% for men and 9.5% for women and the figures predicted by the second variant of response functions are closer. In our analysis, the cumulative risks for women and men never smokers predicted by our model were 0.39% and 0.42%, respectively. These figures are in excellent agreement with those calculated by Peto et al.21 when competing risks were not considered, although they were somewhat lower than the corresponding figures, based on CPS-II Study, as reported by Thun et al.22


Modeling carcinogenesis for smokers and never smokers

As detailed in the Results, we separately fitted carcinogenesis models for current and never-smokers. The underlying philosophy was that individuals who develop lung cancer without active smoking histories, that is, at much lower levels of exposure, may have a very different genetic background from those who required long-term exposure to develop the disease. Indeed, as suggested by Table I, relative lung cancer risk associated with supoptimal DRC decreased with increased smoking exposure. This can be explained by the fact that conditional on a person developing lung cancer with little or no smoking exposure, this person’s relative risk associated with impaired DRC, logically is increased. Vice versa, with heavy smoking exposure, the relative risk associated with other factors is decreased, because smoking overwhelms even a relatively resistant phenotype.

One might expect that this would be reflected in baseline mutation rates being higher for never smokers with lung cancer than for smokers before smoking initiation. These latter mutation rates are not directly represented in our response functions. However, if the coefficients a2 and/or a3 were negative, this would suggest such an effect. As seen in Table III, the estimate of a3 was indeed negative, but not significantly so, making the interpretation difficult. The fact that neither a2 nor a3 were negative represents the tradeoff between trends before and after smoking initiation, which were jointly represented in the model. Meza et al.23 found that a joint model for smokers and never-smokers resulted in estimates not very different from those based on 2 separate models.

Biological interpretation of the parameters

DRC may have an effect on carcinogenesis at the initiation and/or malignant transformation phases, because at these transitions, mutations are required. Positive values of a1 and a4 indicate an association between suboptimal DRC and earlier onset of lung cancer. In the presence of cigarette smoking, the transition rates can be altered. In our model, parameters a2 and a3 referred to such changes (see also the definitions of the first and second variants of response functions). The impact that cigarette smoking had on initiation and malignant transformation can be either that the 2 transitions were accelerated or a new competing carcinogenesis pathway emerged as a result of exposure to high doses of carcinogens. Note that in the second variant of response functions, the terms a2 and a3 cannot be used to distinguish between these two possibilities. The parameters a2 and a3 were jointly estimated from the period before and after initiation, and therefore, there was a tradeoff between the background rate and the rate of the response to smoking. One can hypothesize that in the absence of smoking, the carcinogenesis process via path 2 (Figure 3) has a much lower probability of occurring than that via path 1, due to an extremely low malignant transformation rate and slow proliferation of IC2. However, still there could be generation of IC2 from NC, and it is possible that this process is faster than transition from NC to IC1 even before cigarette smoking starts. After initiation of smoking, the malignant transformation from IC1 to MC could be suppressed and most MCs come from path 2.

Cigarette smoking also affected the clonal expansion of initiated cells in our analysis. The TSCE models the proliferation of initiated cells as a birth–death process and its derived survival probability function largely depends on the net growth parameter α – β. Parameter a5*(log(40) − log(20)) represents the change in the net growth rate when increasing the smoking intensity from 20 cig/day to 40 cig/day.

Male never and current smokers

For male never smokers, the effect of suboptimal DRC on initiation and malignant transformation rates was significant (a1 = 3.02, 95% CI: [2.7, 3.9]) (Table I). The cumulative lung cancer risk for never smoker men with suboptimal DRC was almost 6 times that for those with optimal DRC. For current smokers, the estimated coefficient a5 showing the effect of smoking intensity was about 0.16, significantly greater than zero.

For current smokers, the initiation rate was significantly higher than that in never smokers (a2 = 3.67, CI: [0.23, 13.73]). In contrast to the initiation rate, the malignant transformation in current smokers occurred at a significantly reduced rate when compared with never smokers (a3 = −0.69, CI: [−0.91, −0.19]). Again, this is a result of the lower transformation rate during either non-smoking or smoking period, or both. However, it is unlikely that cigarette smoking slowed down the malignant transformation process. Therefore, it was the background malignant transformation rate in current smokers that was likely to be much lower than that for never smokers. Despite the slower malignant transformation, smokers still carried a considerably higher risk of developing lung cancer when compared with never smokers, as the promotion effect of cigarette smoking dominates the initiation and malignant transformation effect.

Female never and current smokers

For female never smokers, DRC also significantly contributed to initiation and malignant transformation rates (a1 = 3.36, 95% CI: [2.99, 4.00]) (Table I). Its contribution was even higher in current smokers (a4 = 11.46, 95% CI: [7.97; 15.14]) (Table III). Cigarette smoking not only significantly increased the proliferation rate of initiated cells (a5 = 0.38, 95% CI: [0.21, 0.65]) (Table III) but also elevated the initiation rate in current smokers (a2 = 3.53, 95% CI: [0.15–7.42]) (Table III). The promotion effect coefficient a5 was more than twice that of males, suggesting increased smoking intensity confers a higher lung cancer risk for women when compared with men. Unlike men smokers, in whom the a3 coefficient was significantly negative, in women a3 was not different from zero (a3 = −0.34, 95% CI: (−0.74, 1.68), Table III), suggesting (i) a higher malignant transformation rate in smoking women than men and (ii) no difference between the malignant transformation rate in women smokers and never smokers. This finding is in agreement with a number of studies that suggested a greater smoking-associated lung cancer risk in women than men.2427 However, other studies did not show this effect.2830

Previous analyses of lung cancer data using the TSCE have also reported some gender differences in the smoking-related parameters of the TSCE model.23,31 In particular, in the NHS/HPFS analysis, Meza et al.23 also found a higher smoking-related promotion among women. However, contrary to the findings reported here, they found a lower smoking effect on malignant conversion among women.

Limitations of the method

For multistage expansion modeling including TSCE models, alternative models with different assumptions for the biological process may yield similar fit and dose-response relationships. In addition, other empirical models may fit these data equally well. For instance, the Doll and Peto model, a multiplicative power model, and its extended version produced a slightly better fit than TSCE when used to assess lung cancer rates in CPS-I cohort.32 However, as a biological-based model, TSCE model has the potential to offer insights into the underlying mechanism while the formulation of other models was largely driven by data fitting. In our analysis, the model was formulated using parameters for smoking related dose-response on birth and death (parameters a5 and a6). There are 2 smoking value bins (20 and 40 cig/day) and essentially any functional form can be used to relate 2 parameters to 2 dose values while achieving the optimal assignment. This does not constitute a difficulty as long as this response function is not extended beyond the 2 smoking value bins.

As already mentioned, in the second variant of response functions, the terms a2 and a3 cannot be used to distinguish between the two ways in which cigarette smoking may influence initiation and malignant transformation, that is, that either the 2 transitions are accelerated or a new competing carcinogenesis pathway emerges. This limitation is a consequence of nonidentifiability of the TSCE model.

There are some limitations concerning the data used. First, causal relationship between lung cancer development and suboptimal DRC has not been established yet. However, the fact that DRC does not depend on stage1012 suggests that DRC is a risk factor rather than a disease marker. Future research is needed to draw causal inference in a prospective study setting. Second, our conclusions are subject to sampling error as the sample size of the case–control data is not large after stratification by gender and smoking status. Furthermore, a violation of the assumption that the data from the CPS-II study and the M.D. Anderson study are compatible may result in different conclusions as well.

Relation to other models

This study builds the response function using the original biological parameters as a function of risk factor measurements. This makes the interpretation of the estimated parameters more straightforward. Unlike our model, some studies include an extra variable, the waiting time from the first malignant cell to a detectable tumor. Because of the nonidentifiability problem, we decided not to adopt this approach because adding an extra parameter produces additional variation to the final parameter estimates. Despite some differences in the model assumptions and parameter set-up, our model results show that cigarette smoking increases the proliferation rate, which corroborates previous findings.1618 However, whether or not smoking impacts the malignancy transformation is inconclusive in earlier studies. Our model suggests that among men the malignant transformation rate for current smokers is lower when compared with that for never smokers. For women, there is no significant difference.


To conclude, the new findings from our study are as follows. First, besides cigarette smoking, genetic susceptibility, measured by DRC in this case, is an additional risk factor for lung cancer development that affects both initiation and malignant transformation rates. Second, our model favors the assumption that the carcinogenesis rate at each step is different in never smokers and smokers.

Supplementary Material


We would like to acknowledge Dr. Michael Thun for releasing the CPS-II study report to us. We would also like to thank Dr. William Hazelton for his help with programming and Dr. Hazelton and Dr. Rafael Meza for valuable suggestions.

Grant sponsor: A Lung CISNET; Grant number: U01 CA097431; Grant sponsor: NEI/NIH; Grant number: R24 EY014817; Grant sponsor: FRMI; Grant number: YCSA32083; Grant sponsor: NCI; Grant number: R01 CA55769; Grant sponsor: Prevent Cancer Foundation grant, William Keck pre-doctoral fellowship.


The survival function of the TSCE model for never smokers is given below to demonstrate the dependence of the probability function in f1 and f2 on the biological parameters. The survival function for current smokers is more complicated but it is linked to the biological parameters in a similar fashion.

P(t1T<t2)=P(T>t1)P(T>t2)P(t1T<t2[mid ]T>t1)=P(t1T<t2)P(T>t1)P(T>t)=[2ce(c+δ)t/2(c+δ)ect+cδ]ρρ=ναδ=αβμc=δ2+4αμ


Additional Supporting Information may be found in the online version of this article.


1. Spitz M, Wei Q, Dong Q, Amos CI, Wu X. Genetic susceptibility to lung cancer: the role of DNA damage and repair. Cancer Epidemiol Biomarkers Prev. 2003;12:689–98. [PubMed]
2. Doll R, Hill AB. Smoking and carcinoma of the lung. BMJ. 1950;2:739–48. [PMC free article] [PubMed]
3. Doll R, Peto R. Cigarette smoking and bronchial carcinoma: dose and time relationship among regular smokers and life long non-smokers. J Epidemiol Community Health. 1978;32:303–13. [PMC free article] [PubMed]
4. Moolgavkar SH, Dewanji A, Luebeck G. Cigarette smoking and lung cancer: reanalysis of the British Doctor’s Data. J Natl Cancer Inst. 1989;81:415–20. [PubMed]
5. Bennett WP, Alavanja MC, Blomeke B, Vähäkangas KH, Castrén K, Welsh J, Bowman ED, Khan MA, Flieder DB, Harris CC. Environmental tobacco smoke, genetic susceptibility, and risk of lung cancer in never-smoking women. J Natl Cancer Inst. 1999;91:2009–14. [PubMed]
6. Bromen K, Pohlabeln H, Jahn I, Ahrens W, Jockel KH. Aggregation of lung cancer in families: results from a population-based case-control study in Germany. Am J Epidemiol. 2000;152:497–505. [PubMed]
7. Dresler CM, Fratelli C, Babb J, Everley L, Evans AA, Clapper ML. Gender differences in genetic susceptibility for lung cancer. Lung Cancer. 2000;30:153–60. [PubMed]
8. Heidenreich WF, Wellmann J, Jacob P, Wichmann HE. Mechanistic modeling in large case-control studies of lung cancer risk from smoking. Stat Med. 2002;21:3055–70. [PubMed]
9. Thun MJ, Myers DG, Day-Lally C, Namboodiri MM, Calle EE, Flanders WD, Adams SL, Heath CW. Age and the exposure-response relationships between cigarette smoking and premature death in cancer prevention study II. In: Burns DM, Garfinkel L, Samet JM, editors. Changes in cigarette-related disease risk and their implication for prevention and control. Bethesda, MD: National Cancer Institute, National Institute of Health; 1997. pp. 347–54. NIH publication 97–4213.
10. Shen H, Spitz MR, Qiao Y, Guo Z, Wang LE, Bosken CH, Amos CI, Wei Q. Smoking, DNA repair capacity and risk of nonsmall cell lung cancer. Int J Cancer. 2003;107:84–8. [PubMed]
11. Gorlova O, Weng S-F, Zhang Y, Amos C, Spitz M, Wei Q. DNA repair capacity and lung cancer risk in never smokers. Cancer Epidemiol Biomarkers Prev. 2008;17:1322–8. [PubMed]
12. Wei Q, Cheng L, Amos CI, Wang LE, Guo Z, Hong WK, Spitz MR. Repair of tobacco carcinogen-induced DNA adducts and lung cancer risk: a molecular epidemiologic study. J Natl Cancer Inst. 2000;92:1764–72. [PubMed]
13. Knudson AG. Mutation and cancer: statistical analysis of retinoblastoma. PNAS. 1971;68:820–23. [PubMed]
14. Moolgavkar SH, Venzon DJ. Two-event models for carcinogenesis: incidence curves for childhood and adult tumors. Math Bios. 1979;47:55–77.
15. Moolgavkar SH, Day NE, Stevens RG. Two-stage model for carcinogenesis: epidemiology of breast cancer in females. J Natl Cancer Inst. 1980;65:559–69. [PubMed]
16. Hazelton WD, Luebeck EG, Heidenreich WF, Moolgavkar SH. Analysis of a historical cohort of Chinese tin miners with arsenic, radon, cigarette smoke, and pipe smoker exposure using the biologically based two-stage clonal expansion model. Radiat Res. 1999;56:78–94. [PubMed]
17. Hazelton WD, Clements MS, Moolgavkar SH. Multistage carcinogenesis and lung cancer mortality in three cohorts. Cancer Epidemiol Biomarkers Prev. 2005;14:1171–81. [PubMed]
18. Luebeck EG, Heidenreich WF, Hazelton WD, Paretzke HG, Moolgavkar SH. Biologically based analysis of the data for the Colorado uranium miners cohort: age, dose and dose-rate effects. Radiat Res. 1999;152:339–51. [PubMed]
19. Tan WY. Stochastic models of carcinogenesis. New York: Marcel Dekker, Inc; 1991. p. 84.
20. Heidenreich WF, Luebeck EG, Moolgavkar SH. Some properties of the hazard function of the two-mutation clonal expansion model. Risk Anal. 1997;17:391–98. [PubMed]
21. Peto R, Darby S, Deo H, Silcocks P, Whitley E, Doll R. Smoking, smoking cessation, and lung cancer in the UK since 1950: combination of national statistics with two case-control studies. BMJ. 2000;321:323–29. [PMC free article] [PubMed]
22. Thun MJ, Henley SJ, Calle EE. Tobacco use and cancer: an epidemiologic perspective for geneticists. Oncogene. 2002;21:7307–25. [PubMed]
23. Meza R, Hazelton WD, Colditz GA, Moolgavkar SH. Analysis of lung cancer incidence in the nurses’ health and the health professionals’ follow-up studies using a multistage carcinogenesis model. Cancer Causes Control. 2008;19:317–28. [PubMed]
24. Harris RE, Zang EA, Anderson JI, Wynder EL. Race and sex differences in lung cancer risk associated with cigarette smoking. Int J Epidemiol. 1993;22:592–99. [PubMed]
25. Risch HA, Howe GR, Jain M, Burch JD, Holowaty EJ, Miller AB. Are female smokers at higher risk for lung cancer than male smokers? A case-control analysis by histologic type. Am J Epidemiol. 1993;38:281–93. [PubMed]
26. Henschke CI, Yip R, Miettinen OS. Women’s susceptibility to tobacco carcinogens and survival after diagnosis of lung cancer. JAMA. 2006;296:180–84. [PubMed]
27. Wakelee HA, Chang ET, Gomez SL, Keegan TH, Feskanich D, Clarke CA, Holmberg L, Yong LC, Kolonel LN, Gould MK, West DW. Lung cancer incidence in never smokers. J Clin Oncol. 2007;25:472–8. [PMC free article] [PubMed]
28. Osann KE, Anoton-Culver H, Kurosaki T, Taylor T. Sex differences in lung cancer risk associated with cigarette smoking. Int J Cancer. 1993;54:44–8. [PubMed]
29. Prescott E, Osler M, Hein HO, Borch-Johnse K, Lange P, Schnohr P, Vestbo J. the Copenhagen Center for Prospective Population Studies. Gender and smoking-related risk of lung cancer. Epidemiology. 1998;9:79–83. [PubMed]
30. Wakelee H, Gomez SL, Chang ET. Sex differences in lung cancer susceptibility: a smoke screen? Lancet Oncol. 2008;9:609–10. [PubMed]
31. Schöllnberger H, Manuguerra M, Bijwaard H, Boshuizen H, Altenburg HP, Rispens SM, Brugmans MJ, Vineis P. Analysis of epidemiological cohort data on smoking effects and lung cancer with a multistage cancer model. Carcinogenesis. 2006;27:1432–44. [PMC free article] [PubMed]
32. Knoke JD, Shanks TG, Vaughn JW, Thun MJ, Burns DM. Lung cancer mortality is related to age in addition to duration and intensity of cigarette smoking: an analysis of CPS-I data. Epidemiol Biomarkers Prev. 2004;13:949–57. [PubMed]