In this paper, we introduce a resampling-based method of merging case-control data on risk factors and tabled incidence rate data to reconstruct time to event data. This method was used to fit a model that allows for prediction of lung cancer risk over an individual’s lifetime based on gender and smoking history. It is generally preferable to use prospective cohort data to fit time to event models. However, in this study they were unavailable. Also, a further complication of this study is that the case-control study design included partial matching on the risk factor of interest, namely smoking status. By using external incidence/mortality rate data, the method was able to adjust for the matching present in the case-control study.
Applying the resampling method to MD Anderson case-control data on smoking histories and tabled lung cancer incidence/mortality rate data, we were able to fit a TSCE model based on smoking history. The fitted model was validated against the control arm of CARET where it accurately predicted the number of lung cancer deaths observed during the trial after adjusting for the healthy volunteer effect. The adjustment consists in removal of the first few years of follow-up, as it was performed in Bach et al. (22
The simulation study on the resampling method showed that the method proposed here did result in satisfactory fits. The choice of fitting 200 pseudo-cohorts was determined to provide stable estimates in this study however this could be different in other applications of the method. Likewise 20,000 individuals were resampled per pseudo-cohort in order to get reasonable numbers of cases in each pseudo-cohort, as lung cancer incidence is low for certain smoking categories such as never smokers or very long term quitters. Sensitivity of these assumptions on the ability to accurately predict requires further study.
Previously, Heidenreich et al. (23
) and Deng et al. (18
) have developed approaches to fit the TSCE model using case-control data, both using additional mortality data to allow estimation of the age dependence of the hazard function. In particular, Heidenreich et al. (23
) introduced a direct case-control likelihood approach for fitting the TSCE model. The proposed likelihood was designed to fit the TSCE model to case-control studies with larger sample sizes than the case-control study used in this analysis. As a result, when applied to our data the Heidenreich et al. approach resulted in a flat likelihood function. Summarizing, our approach is the resampling equivalent of Heidenreich et al. methodology. By resampling, the weights that Heidenreich et al. used in the likelihood function are essentially created. The proposed resampling method allows bypassing this problem of a flat likelihood function. Deng et al. (18
) incorporated a least squares estimation approach utilizing a complicated objective function. The resampling approach provides a straightforward alternative. Further, the resampling method can be used to reconstruct time to event data for use in applications other than model fitting.
The parameterization used in this paper differs from the former model developed by Deng et al. (18
) in a few ways. First, the Deng et al. study included the effect of DNA repair capacity and resulted in 9 fitted parameters. In this study, the model was fitted with the intention to use as few parameters as possible while maintaining satisfactory prediction quality; as a result only 5 parameters were used. This leaves room for additional risk factors to be included in a later version. Regarding non-identifiability and parameterization, the models also differ. For this study the two background mutation rates were assumed equal (ν0
). Also, the model presented here includes the net proliferation rate γ, instead of the death rate of the ICs β,,which was used in Deng et al.
In conclusion, the proposed resampling method provides an opportunity to fit time to event models to case-control data and to evaluate the effects of risk factors, including factors other than smoking, on different stages of carcinogenesis. The method presented here can accurately predict the risk of lung cancer death based on individual level data on age, gender, and smoking history.