Prediction models 1 and 2 demonstrated high discrimination and calibration. When both models were applied to the external validation sample, the ROC AUCs declined but remained high overall. The ROC AUCs also remained high in external validation for subsamples of women, men, whites, and nonwhites separately. In addition to model 2 variables, sex, race/ethnicity, and black men vs
other race–sex groups pooled did not appear to be important predictors. Etzel et al. (13
) developed a lung cancer risk prediction model in African Americans because existing models had been developed in whites, levels of risk are different for risk factors that African Americans share with whites, and unique group-specific risk factors exist for African Americans. (13
) Our models appear to work equally well in whites and nonwhites.
Our estimates of ROC AUC corrected for overfitting by bootstrap were higher than those observed in the external validation sample, which is expected because it is difficult to bootstrap all phases of model development (42
). For example, all phases of variable selection and evaluation were not bootstrapped, and the external validation sample differs in unmeasured ways from the development sample, and this was not accounted for in bootstrap validation.
To facilitate interpretation of model 2, a nomogram was prepared to allow estimation of the 9-year probability of lung cancer given an individual’s specific risk factors. Individuals, patients, clinicians, and researchers can use this graphic tool to estimate lung cancer risk. The nomogram (Supplementary Figure 1
, available online) converts specific variable level values into points, sums the points, and converts the overall point total for all predictor variables into the probability of lung cancer risk according to the logistic model 2.
To date, several lung cancer risk prediction models have been proposed. Most authors presented the ROC AUC or c
statistic as the primary measure of predictive performance, but few have presented calibration data. Doll and Peto (6
) and Prindiville et al. (7
) described lung cancer risk prediction models but did not present predictive performances. Bach et al. (8
) used prospective cohort data for smokers in the Carotene and Retinol Efficacy Trial to develop their prediction model. Their model predictors included age, sex, asbestos exposure, and smoking history. Cronin et al. (46
) externally validated the Bach model (8
) in the Alpha-Tocopherol, Beta-Carotene Cancer Prevention Study control arm. The overall c
statistic was 0.69. The initial Spitz et al. models (10
) had cross-validated c
statistics of 0.59, 0.63, and 0.65 in never, former, and current smokers, respectively. They attempted to improve the initial models by adding two markers of DNA repair capacity (12
), which improved the ROC AUC from 0.67 to 0.70 in former smokers and from 0.68 to 0.73 in current smokers. The Etzel et al. (13
) risk prediction model for blacks had a discrimination ability of 0.63 in external data. The risk prediction model produced by Cassidy et al. (11
) included smoking duration, history of pneumonia, occupational exposure to asbestos, prior diagnosis of malignant tumor, and family history of lung cancer, and had an internally validated (cross-validation) ROC AUC of 0.70.
The predictive performance statistics of our models are not directly comparable with those of other models because differences in distributions of predictor variables can affect performance statistics (47
). However, because our models include new predictor variables, including socioeconomic status (education), BMI, history of recent chest x-ray, and COPD; an increased number of smoking variables; and inclusion of nonlinear effects, they may have better predictive accuracy than older models. Socioeconomic status may function as a predictor of lung cancer because it is a marker of unmeasured environmental, occupational, or behavioral exposures. Higher BMI has been associated with reduced risk of lung cancer in several studies (16
). Some research suggests that this association might have a biological basis. Both smoking-related DNA adducts measured in peripheral blood lymphocytes and oxidative DNA damage measured by levels of urinary 8-hydroxydeoxyguanosine appear to be inversely associated with BMI, adjusted for smoking (48
). This suggests that lean individuals might be more vulnerable to smoking carcinogen-related DNA damage. Recent chest x-ray may be a marker of pulmonary disease or chronic inflammation and thus serve as a predictor of lung cancer. Smoking is the primary causal agent of lung cancer, and thus it is expected that models describing smoking exposure in greater detail would have improved predictive abilities. Furthermore, many associations in nature are nonlinear, and our models were improved by including nonlinear components. For example, model 2 with nonlinear terms had an ROC AUC of 0.809, and the same model with linear terms replacing the spline terms had an ROC AUC of 0.804, with the difference statistically significant (−0.005, 95% CI = −0.009 to −0.001; P
This study had several limitations. The PLCO study participants were aged 55–74 years at study entry and were on average of higher socioeconomic status than the general population and may exhibit a healthy volunteer effect (51
), which may limit external generalizability. However, with the exception of educational level, the model predictors appear to have a biological basis that is expected to be independent of age and socioeconomic status. Data on several potentially useful predictors were unavailable for analysis, including exposure to radon, asbestos, secondhand smoke, occupational carcinogens, and history of adult pneumonia. Inclusion of these variables might have added to our risk prediction models. However, because predictors must have strong associations with lung cancer to have an impact on prediction (52
), and these missing predictors have modest associations with lung cancer, their inclusion would have led to only small improvements in prediction. In addition, because our external validation sample came from the same referent population as our model development sample, our models may not perform as well when applied to other population samples.
This work also had several strengths. We used a prospective design, which does not have the methodological weaknesses of case–control studies (10
), which cannot estimate incidence or absolute risk directly, and which is vulnerable to selection and recall biases. In some studies, the case patients and control subjects were matched on age (10
), sex (11
), and smoking status (10
), which prevents effective assessment of these predictors because they have been forced to be similar by study design, and case patients and control subjects were not taken from the same referent population (10
). One study chose “healthy” control subjects as the comparison group for case patients (10
), which might lead to selection bias. For example, such sampling could lead to an exaggerated effect for emphysema, if individuals with emphysema were excluded from the control group, but not the case patient group. In contrast, the PLCO sampling was population based and represents many different regions of the United States, resulting in improved internal and external validity.
We also used updated statistical methods, including modeling of nonlinear effects using restricted cubic splines and bootstrap correction of predictive performance optimism, which could improve the accuracy of predictive models. For example, Bach et al. (7
) placed continuous smoking exposure data into four categories (7
). The loss of information resulting from categorization of continuous data could lead to loss of predictive ability. In another analysis (10
), selection of predictor variables for entry into the multivariable models was based on the criterion of P
less than .05 in univariate analysis (10
), which could result in important predictors being left out of models more often than when less stringent P
are used. In the same study, final multivariable models were also restricted to predictors that had P
less than .05, which could result in suboptimal predictive performance (41
). Spitz et al. (10
) found associations with lung cancer for some predictors only in subsets of their sample, and criteria for positivity changed from one group to another, which suggests that data exploration, use of optimal cut points, and multiple comparisons may have contributed to their findings. For example, Spitz et al. (10
) reported that in former smokers, a family history of at least two of any cancers vs
one or fewer was predictive, whereas in current smokers, a family history of at least one smoking-related cancer vs
none was predictive (10
). Such analytical practices are likely to lead to overfitting of models and lack of reproducibility (53
Because lung cancer was a primary endpoint of the PLCO, and follow-up and monitoring for lung cancer were meticulously maintained, data quality was high. The PLCO is a large mature study with enough outcome events to allow estimation with precision and reduced tendency to overfit models. Because the PLCO trial was not restricted to high-risk individuals, modeling applicable to the general population was possible. External validation of the models provided a realistic sense of the predictive potential of the models.
In future research, it will be important to conduct additional external validations of our models in diverse samples. Although the current models demonstrated high predictive performance, the models can be improved. Genomewide association studies have identified inherited susceptibility variants for lung cancer at chromosomal loci 15q25 (54
), 5p15 (57
), and 6p21 (58
). Future studies should investigate whether genetic polymorphism data contribute independent predictive information to models and whether they explain the predictive effect of family history of lung cancer. Numerous serum biomarkers associated with lung cancer (12
) and pulmonary function (65
) data need to be evaluated in risk prediction models.
In conclusion, our two lung cancer risk prediction models demonstrated high discrimination and calibration and are expected to be able to discriminate between high- and low-risk individuals. Other high-quality data sources in cohort settings should be used to validate and extend our findings.