|Home | About | Journals | Submit | Contact Us | Français|
The National Lung Screening Trial (NLST) used risk factors for lung cancer (e.g., ≥30 pack-years of smoking and <15 years since quitting) as selection criteria for lung-cancer screening. Use of an accurate model that incorporates additional risk factors to select persons for screening may identify more persons who have lung cancer or in whom lung cancer will develop.
We modified the 2011 lung-cancer risk-prediction model from our Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial to ensure applicability to NLST data; risk was the probability of a diagnosis of lung cancer during the 6-year study period. We developed and validated the model (PLCOM2012) with data from the 80,375 persons in the PLCO control and intervention groups who had ever smoked. Discrimination (area under the receiver-operating-characteristic curve [AUC]) and calibration were assessed. In the validation data set, 14,144 of 37,332 persons (37.9%) met NLST criteria. For comparison, 14,144 highest-risk persons were considered positive (eligible for screening) according to PLCOM2012 criteria. We compared the accuracy of PLCOM2012 criteria with NLST criteria to detect lung cancer. Cox models were used to evaluate whether the reduction in mortality among 53,202 persons undergoing low-dose computed tomographic screening in the NLST differed according to risk.
The AUC was 0.803 in the development data set and 0.797 in the validation data set. As compared with NLST criteria, PLCOM2012 criteria had improved sensitivity (83.0% vs. 71.1%, P<0.001) and positive predictive value (4.0% vs. 3.4%, P = 0.01), without loss of specificity (62.9% and. 62.7%, respectively; P = 0.54); 41.3% fewer lung cancers were missed. The NLST screening effect did not vary according to PLCOM2012 risk (P = 0.61 for interaction).
The use of the PLCOM2012 model was more sensitive than the NLST criteria for lung-cancer detection.
The national lung screening trial (NLST) showed that lung-cancer screening with the use of low-dose computed tomography (CT) resulted in a 20% reduction in mortality from lung cancer.1 Some organizations now recommend adoption of lung-cancer screening in clinical practice for high-risk persons if high-quality imaging, diagnostic methods, and treatment are available.2-4 Most of these recommendations identify persons to be screened by applying the NLST criteria, which include an age between 55 and 74 years, a history of smoking of at least 30 pack-years, a period of less than 15 years since cessation of smoking, or some variant of these criteria. These selection criteria were intended to increase the yield of lung cancers, but they exclude many known risk factors for lung cancer, and with dichotomization of continuous data, much valuable information is not included.5 Thus, NLST enrollment criteria may not identify substantial numbers of persons who will receive a diagnosis of lung cancer, and they may not sensitively select lung-cancer cases in screening samples. Applying an accurate lung-cancer risk-prediction model to a population can identify persons at highest risk; screening them is expected to increase the number of lung cancers identified per given sample size or reduce the number of persons needed to be screened per fixed number of lung cancers detected.
We previously developed and validated a lung-cancer risk-prediction model involving former and current smokers in the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial control and intervention groups.6 Model predictors included age, level of education, body-mass index (BMI), family history of lung cancer, chronic obstructive pulmonary disease (COPD), chest radiography in the previous 3 years, smoking status (current smoker vs. former smoker), history of cigarette smoking in pack-years, duration of smoking, and quit time (the number of years since the person quit smoking). This model has high predictive discrimination measured with the use of the area under the receiver-operating-characteristic curve (AUC), but it can be cumbersome to apply because it uses complicated modeling procedures (i.e., restricted cubic splines) and may benefit from the inclusion of additional predictors. In the PLCO model, risks are based on a median follow-up of 9.2 years, which exceeds the follow-up in the NLST and makes estimates inaccurate when applied to the NLST.
The aims of the current study were to modify and update our lung-cancer model for current and former smokers to make it directly applicable to NLST data. We also aimed to evaluate the extent to which selection of participants with the use of model-estimated high risk is more efficient than NLST criteria. We used each method to select PLCO intervention-group participants and determined the classification accuracies for selecting persons who receive a diagnosis of lung cancer in 6 years of follow-up.
The PLCO and NLST study designs and results have been described previously,1,7-11 and the designs and methods are summarized in Table 1. In both trials, approvals were obtained from institutional review boards at all study centers, and written informed consent was obtained from all participants. The current study involved 73,618 smokers in the PLCO study and 51,033 NLST participants for whom epidemiologic data were available. All histologically confirmed lung cancers that were diagnosed from study entry through 6 years of follow-up were included. Data on predictor variables were collected with the use of epidemiologic questionnaires administered at study entry.
We developed a modified logistic-regression model for lung-cancer prediction in the PLCO control group of smokers. This model was referred to as PLCOM2012 to distinguish it from its predecessor, PLCOM2012. We validated the model in the PLCO intervention group of smokers, NLST participants, and in the PLCO intervention group stratified according to whether or not they met NLST criteria. In all data sets, follow-up was truncated at 6 years to make comparisons uniform between groups. Predictor variables considered for entry into the model included risk factors for lung cancer recognized in the literature and PLCOM2012.6,15-19 Model development was guided by predictive performance and was not limited to predictors with a P value of less than 0.05. Selected interactions thought to be credible a priori were evaluated, including sex–race or ethnic group and sex–smoking interactions. All interactions were found to be nonsignificant and are not discussed further. Nonlinear associations between continuous variables and lung cancer were evaluated with the use of multivariable fractional polynomials.20 We evaluated modeling assumptions and assessed model fit by graphically plotting residuals against model parameter values.
The ability of the models to discriminate between lung-cancer cases and noncases was evaluated according to the AUC in the validation data set. Model calibration (how well predicted probabilities corresponded to observed probabilities) was assessed by plotting a smoothed curved line with a locally weighted scatterplot smoothing (LOWESS) plot showing the relationship between observed and predicted probabilities of lung cancer. The mean absolute differences in observed and predicted probabilities for each decile of predicted risk were assessed. As summary statistics, the median and 90th-percentile absolute differences between observed and predicted values are presented.21 Improvement in classification of cases, noncases, and cases and noncases combined from the inclusion of selected variables in models was analyzed with the use of net reclassification improvement22 with the following levels of 6-year risk: low, less than 1.0%; intermediate, 1.0% to less than 2.0%; and high, 2.0% or more.
Next, we applied the NLST smoking criteria (≥30 pack-years of smoking and <15 years since cessation) to the PLCO intervention-group smokers; this provided the number of persons who met the NLST criteria. We selected a PLCOM2012 risk cutoff point so that the number of persons above this point was exactly the same as the number of persons who met the NLST criteria. This provided comparison samples of equal size, which were positive according to each criterion. The method that selected the largest proportion of diagnosed lung cancers in these samples would be the most efficient one to use in screening programs. We compared the sensitivity, specificity, and predictive values of both sets of criteria for selecting lung cancers. Confidence intervals for proportions were prepared with the use of the binomial exact method.23
Finally, to see whether the reduction in mortality associated with low-dose CT screening in the NLST varied according to the risk of lung cancer, we prepared a Cox regression model using NLST data with a screening intervention– PLCOM2012 risk interaction. The significance of this multiplicative interaction term was evaluated with the use of the Wald statistic. We present Cox model hazard ratios for the screening-intervention variable stratified according to quartiles of PLCOM2012 risk.
With regard to descriptive statistics, distributions of study variables according to lung-cancer status were compared with the use of Fisher’s exact test for categorical variables, the t-test for continuous variables, and the nonparametric test for ordinal variables. All statistics and figures were prepared with the use of Stata software, version MP12.1 (Stata). All hypothesis testing used an alpha-error cutoff point of 0.05.
Distributions of predictor variables in 80,375 smokers in the PLCO control and intervention groups, in combined groups of the NLST (53,202 persons), and in the PLCO intervention group of persons who met NLST smoking criteria (15,099 persons) are listed in Table S1 in the Supplementary Appendix, available with the full text of this article at NEJM.org. Because the goal of the current study was not to reevaluate NLST intervention effects and because the distribution of participant characteristics according to NLST study groups has already been published,1,11,24 we used pooled statistics to provide an overall description of NLST participants as compared with PLCO participants. Table S1 in the Supplementary Appendix shows incidence rates of lung cancer and mean probabilities of lung cancer in former and current smokers. The higher incidence observed among NLST former smokers resulted from the exclusion of former smokers with histories of light smoking (<30 pack-years).
In PLCOM2012 (Table 2), the risk of lung cancer increased with age, black vs. white race, lower socioeconomic status (determined according to the level of education), lower BMI, self-reported history of COPD, personal history of cancer, family history of lung cancer, current smoking, increased smoking intensity (the average number of cigarettes smoked per day) and duration, and, in former smokers, shorter time since quitting. In multivariable modeling, smoking intensity had a significant nonlinear association with lung cancer (P<0.001 for nonlinearity) (Fig. 1 and Table 2). The increase in risk became smaller as smoking intensity increased. Inclusion of smoking intensity in the model as a nonlinear variable rather than a linear variable led to an overall net reclassification improvement in the PLCO control group of 2.1% (P = 0.02) and an increase in the AUC from 0.789 to 0.803 (P = 0.04). Inclusion of status with respect to a personal history of cancer and race or ethnic group, which were excluded from PLCOM2012, led to an overall net reclassification improvement of 0.9% (P = 0.16) and improvement increase in the AUC from 0.799 to 0.803 (P=0.05). These incremental improvements in prediction seem modest, but it is difficult to achieve large gains in prediction when adding new predictors to a strong base model.25 The results of net-reclassification-improvement analyses are included in Table S2 in the Supplementary Appendix.
In PLCOM2012, the AUC for smokers in the PLCO control group (the development sample) was 0.803 (95% confidence interval [CI], 0.782 to 0.813), and the AUC for smokers in the PLCO intervention group (the validation sample) was 0.797 (95% CI, 0.782 to 0.813) (Table 3, and Fig. S1 in the Supplementary Appendix). In contrast, when the NLST criteria were applied, the AUC was 0.689 (95% CI, 0.673 to 0.795) for smokers in the PLCO control group and 0.670 (95% CI, 0.653 to 0.686) for those in the intervention group. In PLCOM2012, the AUC for the NLST participants was 0.701 (95% CI, 0.689 to 0.712), and for PLCO intervention participants who met the NLST criteria, it was 0.710 (95% CI, 0.689 to 0.732). The latter two AUCs were lower than those observed in the PLCO development and validation data sets because of a higher concentration of high-risk persons (persons who had never smoked and light smokers were excluded). High discrimination is easier to attain in data that are heterogeneous with regard to risk.
PLCOM2012 calibration assessment in the PLCO intervention-group smokers (Table 3) showed that the median and 90th percentile absolute differences between observed and predicted risk probabilities were 0.009 and 0.042, respectively. That is, the difference between observed and predicted probabilities of lung-cancer risk was less than 0.010 in half the validation sample and less than 0.043 in 90% of the sample. The mean absolute differences between observed and predicted lung-cancer risk in increasing deciles of PLCOM2012 risk are shown in Figure S2 in the Supplementary Appendix. In each of the first five deciles of risk, the mean absolute differences in risks were 0.015 or less, and in the first nine deciles of risk, the mean absolute differences in risks were 0.043 or less.
For comparative purposes, we prepared a Cox survival model with the same predictors as in the logistic PLCOM2012 model. The effect estimates (hazard ratios and odds ratios), standard errors, and predictive performances were similar in the two models (Table S3 in the Supplementary Appendix compares beta coefficients between the models). Because the logistic model is simpler, we describe it here. A spreadsheet calculator is available online (www.brocku.ca/cancerpredictionresearch); it calculates lung-cancer risk according to the PLCOM2012 model, given a person’s predictor levels.
When the NLST criteria were applied to the PLCO intervention group, 14,144 of 37,332 smokers (37.9%) were eligible for screening. For an equal number of persons with the use of the PLCOM2012 criteria, persons with a lung-cancer risk higher than 1.3455% were eligible. The distributions of true and false positive and negative results according to NLST and PLCOM2012 criteria are shown in Table 4. In the comparison of NLST with PLCOM2012 criteria for selection of persons who received a diagnosis of lung cancer, the sensitivities were 71.1% versus 83.0% (P<0.001), the specificities were 62.7% versus 62.9% (P = 0.54), and the positive predictive values were 3.4% versus 4.0% (P = 0.01). Of the persons who were excluded from screening according to NLST and PLCOM2012 criteria, lung cancer developed in 0.85% and 0.50%, respectively (P<0.001). All accuracy measurements favored the PLCOM2012 risk model. Overall, the model identified 81 more of the 678 lung cancers (11.9%) (95% CI, 9.6 to 14.6) than did the NLST criteria (41.3% fewer lung cancers were not detected; 115 vs.196).
On the basis of the performance of the model in the PLCO control smokers, 90% of persons who received a diagnosis of cancer within 6 years would be selected for screening with the use of the PLCOM2012 risk probability of 0.00948 or higher (specificity, 52.0%; positive predictive value, 3.2%), and 48.7% of smokers would have to be screened. To include 80% of lung cancers, a PLCOM2012 risk probability of 0.016082 or higher would be used (specificity, 67.3%; positive predictive value, 4.1%) and the proportion of smokers to be screened would be 33.6%.
In Cox models with the use of NLST data, the protective effect of low-dose CT screening did not differ according to PLCOM2012 lung-cancer risk (P = 0.61 for interaction). We divided PLCOM2012 risk into four roughly equal groups of increasing risk and evaluated the Cox model hazard ratios for low-dose CT versus chest radiography. The hazard ratios were 0.86 (95% CI, 0.50 to 1.48), 0.71 (95% CI, 0.49 to 1.04), 0.70 (95% CI, 0.53 to 0.91), and 0.88 (95% CI, 0.73 to 1.06), respectively. At all four levels of risk, the screening effect was protective. Random variation may explain differences in hazard ratios according to risk quartiles.
In our original PLCOM2012 risk-prediction model, the AUC for smokers in the control group (the development sample) was 0.809 and the AUC for the intervention group (the validation sample) was 0.784. These values indicate high and consistent predictive discrimination. With our modified model, PLCOM2012, the AUCs were similar, at 0.803 and 0.797, respectively. The AUCs in the validation data suggest that predictive discrimination with the PLCOM2012 was slightly improved. A predictive model with an AUC in this range may be of value in providing individual-level information and in population-level screening programs.
The PLCOM2012 was modified from our previous model. In the current analysis, follow-up was truncated at 6 years so that PLCOM2012 data could be evaluated in comparison with NLST, in which complete follow-up was limited to this period. The predictor, radiography in the previous 3 years, was excluded from PLCOM2012. Although this variable was significantly associated with lung cancer, its inclusion did not lead to an increase in the AUC. The variables “race or ethnic group” and “status with respect to a personal history of cancer” were added to PLCOM2012. These additions are consistent with findings of other studies17,19,26 and modestly but significantly improved prediction as measured according to the AUC, net reclassification improvement, or both. A nonlinear relationship between the predictor and lung cancer was described with the use of multivariable fractional polynomials. This approach allowed straightforward calculation of risks and made implementation of the model easy. In PLCOM2012, smoking predictors included smoking status, duration of smoking, history of smoking in pack-years, and time since the person quit smoking. In PLCOM2012, smoking predictors included smoking status, duration of smoking, smoking intensity, and quit time (pack-years were not included). The smoking variables can be converted from one to the other, and it is usual for different combinations of related predictors to have similar predictive abilities. Our PLCO models have advantages over previously published models, which have been described elsewhere.6
PLCOM2012 excluded persons who had never smoked. Additional unique predictors and models are required for prediction of lung-cancer risk among persons who have never smoked, and such models have not been developed. Generally, lung-cancer risk among persons who have never smoked is so low that low-dose CT screening of such persons is not currently warranted. In both the PLCO and the NLST, an age between 55 and 74 years was an entry criterion. Therefore, the predictive performance of the PLCOM2012 outside this age range is uncertain, although most lung cancers occur in persons in this age range. The socioeconomic status of the PLCO study population was higher than that of the general population.27 Although this might theoretically limit generalizability, because most of the predictors appear to have a biologic relationship with lung cancer that is independent of socioeconomic status, the model may still perform well. The PLCOM2012 should be evaluated in different populations and clinical and public health settings in well-designed prospective studies. In the future, additional predictors, such as pulmonary function28 and genetic or biomarker-based predictors, may lead to further enhancement of lung-cancer prediction.
Detailed calculations of sensitivity, specificity, and predictive values for screening low-dose CT and chest radiography were not presented in the final reports of the NLST1 or PLCO.9 However, the positive predictive value of low-dose CT screening in the NLST (computed from reported data) was 3.6%1 and the positive predictive value of baseline chest radiographic screening in the PLCO was 2.0%.29 The positive predictive value for the PLCOM2012 (4.0%) compares favorably.
The wide gap in the ability to predict lung cancers between the NLST and PLCOM2012 criteria should translate into more efficient selection for screening (a higher number of cancers detected per number of persons screened), greater cost-effectiveness, and additional lives saved from low-dose CT screening. Among 37,332 smokers in the PLCO intervention group, the PLCOM2012 selected 81 more persons for screening who received a diagnosis of lung cancer in follow-up than did the NLST criteria. If one assumes a 15% rate of overdiagnosis, then 69 of these persons can be considered to have “true” life-threatening lung cancer. If the 5-year survival rate is 15%, the expected number of deaths among persons who did not undergo screening would be 59. If the mortality reduction is 20%, as observed in the NLST, then in this cohort, 12 additional deaths from lung cancer would have been avoided if selection for screening had been based on PLCOM2012 criteria.
In conclusion, the PLCOM2012 predicted the 6-year risk of lung cancer with high accuracy and was more efficient at identifying persons for lung-cancer screening, as compared with the NLST criteria. Because the mortality reduction from CT screening effectiveness did not vary according to lung-cancer risk, it appears that use of the PLCOM2012 to select persons for lung-screening programs could potentially be an effective method leading to improved cost-effectiveness of screening with additional deaths from lung cancer prevented.
This appendix has been provided by the authors to give readers additional information about their work.
Supplement to: Tammemägi MC, Katki HA, Hocking WG, et al. Selection criteria for lung-cancer screening. N Engl J Med 2013;368:728-36. DOI: 10.1056/NEJMoa1211776
The Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial was supported by the National Cancer Institute (NCI), in part by contracts with the Division of Cancer Prevention and by the Intramural Research Program of the Division of Cancer Epidemiology and Genetics. The American College of Radiology Imaging Network component of the National Lung Screening Trial (NLST) was supported by grants provided under a cooperative agreement with the Cancer Imaging Program, Division of Cancer Treatment and Diagnosis (U01-CA-80098 and U01-CA-79778). The Lung Screening Study sites of the NLST were supported by contracts with the Early Detection Research Group and Biometry Research Group, Division of Cancer Prevention (N01-CN-25514, to the University of Colorado–Denver; N01-CN-25522, to Georgetown University; N01-CN-25515, to the Pacific Health Research and Education Institute; N01-CN-25512, to the Henry Ford Health System; N01-CN-25513, to the University of Minnesota; N01-CN-25516, to Washington University in St. Louis; N01-CN-25511, to the University of Pittsburgh; N01-CN-25524, to the University of Utah; N01-CN-25518, to the Marshfield Clinic Research Foundation; N01-CN-75022, to the University of Alabama at Birmingham; N01-CN-25476, to Westat; and N02-CN-63300, to Information Management Services).
We thank the PLCO and NLST screening-center investigators and the staff from Information Management Services and Westat. Most important, we thank the study participants for their contributions that made these studies possible.
Disclosure forms provided by the authors are available with the full text of this article at NEJM.org.