|Home | About | Journals | Submit | Contact Us | Français|
Major issues in the implementation of screening for lung cancer by means of low-dose computed tomography (CT) are the definition of a positive result and the management of lung nodules detected on the scans. We conducted a population-based prospective study to determine factors predicting the probability that lung nodules detected on the first screening low-dose CT scans are malignant or will be found to be malignant on follow-up.
We analyzed data from two cohorts of participants undergoing low-dose CT screening. The development data set included participants in the Pan-Canadian Early Detection of Lung Cancer Study (PanCan). The validation data set included participants involved in chemoprevention trials at the British Columbia Cancer Agency (BCCA), sponsored by the U.S. National Cancer Institute. The final outcomes of all nodules of any size that were detected on baseline low-dose CT scans were tracked. Parsimonious and fuller multivariable logistic-regression models were prepared to estimate the probability of lung cancer.
In the PanCan data set, 1871 persons had 7008 nodules, of which 102 were malignant, and in the BCCA data set, 1090 persons had 5021 nodules, of which 42 were malignant. Among persons with nodules, the rates of cancer in the two data sets were 5.5% and 3.7%, respectively. Predictors of cancer in the model included older age, female sex, family history of lung cancer, emphysema, larger nodule size, location of the nodule in the upper lobe, part-solid nodule type, lower nodule count, and spiculation. Our final parsimonious and full models showed excellent discrimination and calibration, with areas under the receiver-operating-characteristic curve of more than 0.90, even for nodules that were 10 mm or smaller in the validation set.
Predictive tools based on patient and nodule characteristics can be used to accurately estimate the probability that lung nodules detected on baseline screening low-dose CT scans are malignant. (Funded by the Terry Fox Research Institute and others; ClinicalTrials.gov number, NCT00751660.)
The U.S. national lung screening Trial showed that screening with the use of low-dose thoracic computed tomography (CT) reduces mortality from lung cancer by 20%.1 Major clinical issues in the implementation of low-dose CT screening at the population level include the definition of a positive screening result and the appropriate management of lung nodules detected on a scan. More than 20% of participants in low-dose CT screening programs were found on their first scan to have one or more lung nodules that required further investigation.1–4 The proportion of invasive diagnostic procedures ranged from 1 to 4%.1,3 The risk of major complications was 4.5 complications per 10,000 persons screened, and 25% of the surgical procedures in the National Lung Screening Trial were performed on nodules that were determined to be benign.1 An accurate and practical model that can predict the probability that a lung nodule is malignant and that can be used to guide clinical decision making will reduce costs and the risk of morbidity and mortality in screening programs. We report the development and validation of models and calculators for predicting the probability of lung cancer in pulmonary nodules using data from two separate low-dose CT screening cohorts.
The development data set included participants enrolled in the multicenter Pan-Canadian Early Detection of Lung Cancer Study (PanCan). The validation data set included participants enrolled in several chemoprevention trials sponsored by the U.S. National Cancer Institute and conducted by the British Columbia Cancer Agency (BCCA). In both the PanCan and BCCA studies, an epidemiologic questionnaire was administered and spirometry was performed5,6 at baseline. Ethics approval was obtained from each participating study center, and written informed consent was provided by all participants. The first three authors and the last author vouch for the accuracy and completeness of the data.
Details of the PanCan are provided in the Supplementary Appendix, available with the full text of this article at NEJM.org. In brief, the population-based sample included current and former smokers between 50 and 75 years of age without a history of lung cancer. Eligible participants had a 3-year risk of lung cancer of at least 2% as determined by a prototype of risk-prediction models in the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial.7,8 Participants were recruited from September 2008 through December 2010.
In the BCCA studies, current and former smokers between 50 and 74 years of age without a history of lung cancer and with a smoking history of at least 30 pack-years were recruited from the community from July 2000 through November 2010. The study methods have been reported previously.9,10
In the PanCan, a multidetector-row CT scanner with maximum section collimation of 1.25 mm and four or more data-acquisition channels was used at each participating site. The CT scans were obtained at 120 kVp, 40 to 50 mA, and a tube-rotation time of less than 1 second. Contiguous images were reconstructed in the transaxial plane at up to 1.25-mm thickness. Lung image sets were reconstructed with the use of a high-spatial-frequency algorithm, and mediastinal image sets with the use of an intermediate-spatial-frequency algorithm. A designated radiologist at each site who had specific training in chest radiology reviewed the CT scans (see Text S1 in the Supplementary Appendix).
In the BCCA studies, low-dose CT scanning was initially performed on a single-slice CT scanner and subsequently on 4-, 8-, or 16-detector CT scanners, as reported previously.9,10 The CT scans were obtained at 120 kVp, 40 to 80 mA, and rotation times of up to 1 second. Initially, images were reconstructed at 7 mm in the 22% of the participants who were enrolled between 2000 and early 2002; subsequently, images were reconstructed at a 1.25-mm and 1-mm slice width with the use of both the intermediate (standard or B35f) and high-spatial-frequency (bone or B60f) reconstruction algorithms. A single radiologist who had specific training in chest radiology reviewed all images.
In both cohorts, a CT scan was considered to be abnormal if it showed any noncalcified pulmonary nodule or area of nonsolid density at least 1 mm in diameter on lung parenchymal windows. A nodule was considered to be benign if it showed a benign calcification pattern (e.g., fully calcified or popcorn calcification) or if the size of a solid nodule was unchanged for at least 2 years. Documented characteristics of the nodules included their maximum transverse size, the visually determined type (nonsolid or with ground-glass opacity, part-solid or subsolid, or solid or perifissural), and the location in the lung. The presence of visually detected emphysema was noted.9,10 The presence or absence of spiculation was recorded for nodules in the PanCan cohort but not for those in the BCCA cohort.
Only participants with at least one noncalcified lung nodule on the baseline low-dose CT scan were included in this analysis. These participants were followed with repeat low-dose CT at 3-to-12-month intervals, with the interval determined by the maximum diameter of the long axis of the largest nodule, until any of the following occurred: all nodules were seen to be stable for at least 2 years, the nodules were no longer visible, benign calcification developed, or the nodules were determined to be benign or malignant on biopsy or surgical resection.
The diagnosis of lung cancer was made by histopathological examination of resection specimens or cytopathological examination of needle-aspiration biopsy samples. A microcoil localization technique was used to mark the nodule under CT guidance before surgical resection.11 Resected tumors were classified with the use of the World Health Organization classification of lung neoplasms.12
Descriptive statistics were prepared with the use of contingency-table analyses for categorical data and Fisher’s exact test. The 95% confidence intervals for proportions were estimated with the use of the binomial exact method. Ordinal data were compared with the use of a nonparametric test of trend,13 and continuous data with the use of Student’s t-test. Multivariable logistic-regression models were prepared to estimate the risk of lung cancer associated with potential predictors, including sociodemographic variables and clinical variables such as smoking exposure and nodule characteristics. Inclusion of variables in the models was based on existing knowledge of risk factors for lung cancer and on nodule characteristics that are readily discernible on low-dose CT images. Two sets of predictive models were prepared. The first set was a parsimonious model that included only predictors that were significant (at P<0.05), and the second set, a fuller model that included additional predictors that were thought a priori to be associated with the risk of lung cancer if the P values for them were less than 0.25. In these analyses, the unit of analysis was the nodule. Because some persons had multiple nodules, the variances of effect estimates were adjusted for clustering of data within persons with the use of the Huber–White robust (sandwich) variance estimator.14
Nonlinear effects of continuous variables were evaluated with the use of locally weighted scatterplot smoothing (LOWESS) plots and multivariable fractional polynomials.15 We evaluated interactions between important predictors in final models by including interaction terms along with main-effect terms. None of the interactions we tested were significant, and they are not discussed further in this article.
We evaluated the predictive performance of the model by assessing its discrimination (ability to classify correctly) and its calibration (whether probabilities predicted by the model match observed probabilities). Discrimination was measured with the use of the area under the receiver-operating-characteristic curve (AUC). All AUCs reported are presented with bootstrap bias-corrected 95% confidence intervals, with bootstrapping techniques based on 1000 bootstrapped samples.16 We evaluated calibration by subtracting the model-estimated probability from the observed probability for each study participant, placing these absolute errors in rank order, and evaluating the magnitude of the median and 90th percentile of the absolute errors.17 In addition, the mean absolute errors for each decile of model-predicted risk were evaluated.
Prediction models developed in the PanCan cohort (excluding spiculation as a predictor) were validated externally by means of an assessment of discrimination and calibration in BCCA data. We assessed the performance of the model, excluding and including spiculation, by calculating the AUC in the PanCan data. In addition, we analyzed the improvement in the classification of cases, noncases, and overall data with the inclusion of spiculation in the final model using net reclassification improvement with the following risk strata: low-risk (<5%), intermediate-risk (≥5% to <10%), and high-risk (≥10%).18 All reported P values are two-sided, unless otherwise indicated. The statistical analysis was performed and figures were prepared with the use of Stata/ MP, version 12.1.
A total of 2537 persons were enrolled in the Pan-Can, and at the time of the current analysis, the median overall follow-up was 3.1 years (range, 2.1 to 4.3). During this period, 187 participants (7.4%) were lost to follow-up. The mean time until loss to follow-up among participants without nodules and those with nodules was 1.03 and 1.12 years, respectively. Overall, loss to follow-up was significantly less likely to occur among participants with nodules than among those without nodules (odds ratio, 0.65; 95% confidence interval [CI], 0.47 to 0.99; P = 0.007). In the PanCan, 1871 of the 2537 participants (73.7%) had a total of 7008 lung nodules. Of the participants with nodules, 102 had nodules that were malignant (5.5%). In the BCCA validation study, 1090 participants had 5021 nodules, and 40 of the 1090 persons with nodules (3.7%) were found to have 42 lung cancers during a median follow-up of 8.6 years (range, 2.6 to 12.6). The characteristics of the participants are described in Table S1 in the Supplementary Appendix. The PanCan and BCCA study populations were similar with respect to age, sex, body-mass index, percentage of patients with emphysema, and percent of predicted forced expiratory volume in 1 second (FEV1). The BCCA participants were less likely than the participants in the PanCan to have a family history of lung cancer (18.4% vs. 32.4%) and more likely to be former smokers (81.0% vs. 38.8%) and had a history of fewer pack-years of smoking (48.3 vs. 54.8). In a univariate analysis, the following variables were consistently associated with lung cancer: older age, any emphysema as observed on CT images, and lower percent of predicted FEV1.
The characteristics of the nodules, according to lung-cancer status, are shown in Table 1. In a univariate analysis, significant consistent predictors of lung cancer included the size, type (non-solid, part-solid, or solid), and location of the nodules, and the number of nodules that were detected. Spiculation was a significant predictor in the PanCan data.
The size of the nodule was associated with lung cancer in a significant nonlinear relationship (P<0.001 for nonlinearity). The nonlinear relationship was modeled with the use of multivariable fractional polynomials and is depicted graphically in Figure 1. The transformation of nodule size used in modeling is described in Table 2.
The majority of nodules were solid in appearance (78.9% in the PanCan data set and 79.8% in the BCCA data set) (Table 1). Nonsolid and part-solid nodules accounted for 15.8% and 4.3% of nodules, respectively, in the PanCan group and 9.3% and 0.9%, respectively, in the BCCA group. The remaining nodules were perifissural. The relationships between these nodule types and cancer are described in the section below describing the predictive model. No perifissural nodule was malignant. When the data from the two studies were pooled, the probability of lung cancer in perifissural nodules was zero (0 of 571 nodules; one-sided 97.5% CI, 0 to 0.006).
The location of a nodule was evaluated according to lobar distribution. A larger number of nodules and a larger number of cancers were observed in the left upper and right upper lobes than in the left or right lower lobes or the right middle lobe (Table 1). For this reason, the left upper and right upper lobes were compared with the other lobes in multivariable analysis.
The number of nodules per person was similar in the two data sets: a median of 5 nodules per person (interquartile range, 3 to 9) among the PanCan participants and 7 (interquartile range, 4 to 13) among BCCA participants. In both data sets, the mean and median nodule counts were lower when cancer was present (Table 1).
Because data on the presence or absence of spiculation were not collected in the BCCA studies, we prepared parsimonious and full models that did not include spiculation as a variable (Table 2, models 1a and 2a, respectively) and that did include spiculation as a variable (Table 2, models 1b and 2b, respectively). The variables listed in Table 1 and in Table S1 in the Supplementary Appendix were evaluated for inclusion in the models.
In the parsimonious model with spiculation (Table 2, model 1b), the diagnosis of cancer in a nodule was associated with female sex, increasing size of the nodule, location of the nodule in the upper lung, and spiculation, and in the full models (models 2a and 2b) additional predictors included older age, family history of lung cancer, emphysema, lower nodule count, and part-solid nodules as compared with solid nodules (with nonsolid or ground-glass opacity nodules at a reduced risk as compared with solid nodules). Both parsimonious and full models showed excellent discrimination in the PanCan and BCCA (validation) data with all AUCs more than 0.90 (Fig. S1 and Table S2 in the Supplementary Appendix). In the PanCan and BCCA data sets, model-predicted probabilities of lung cancer showed good separation between participants in whom lung cancer was diagnosed and those in whom it was not diagnosed, with only modest overlap (Fig. 2). The models performed well even when applied to nodules 10 mm or smaller, which are the most clinically challenging and most numerous nodules. For those nodules, the AUCs in model 1a were 0.894 and 0.907 in the PanCan and BCCA data, respectively (Fig. S1 in the Supplementary Appendix).
In the PanCan data, a modified model 1b in which nodule size was treated as a linear term had a significantly lower AUC than did the model in which nodule size was treated as a nonlinear term (0.918 vs. 0.941, P = 0.01 for the difference in AUCs). Although nodule size was the single most important predictor in the multivariable models, the largest lung nodule in a person was not necessarily determined to be malignant. Among the 102 PanCan participants with lung cancer, cancer was detected in the largest nodule in 82 participants, in the second largest in 16, in the third largest in 1, in the fourth largest in 2, and in the fifth largest in 1.
In the BCCA validation data, the full model performed significantly better than the parsimonious model: the AUC was 0.960 (95% CI, 0.927 to 0.980) in model 1a as compared with 0.970 (95% CI, 0.947 to 0.986) in model 2a (P = 0.009 for the difference in AUC), and the difference was particularly pronounced for the clinically relevant group of nodules 10 mm or smaller in size: an AUC of 0.907 (95% CI, 0.822 to 0.963) as compared with an AUC of 0.938 (95% CI, 0.872 to 0.978) (P = 0.002). The difference in AUC of 0.031 is 6.2% of the distance between random and perfect classification.
In model 1a, the median and 90th percentile absolute errors (observed minus predicted probabilities) in the analysis of the PanCan data were 0.0003 and 0.0007, respectively, and the corresponding absolute errors in the analysis of the BCCA validation data were 0.0002 and 0.003. In model 1a, the mean absolute error in all deciles of model-predicted risk in the PanCan and BCCA data was less than 0.015 (Fig. S2 in the Supplementary Appendix), indicating excellent calibration.
The final models with spiculation are presented in Table 2, models 1b and 2b. A comparison of the model with and without spiculation revealed no significant difference in AUC (Table S2 in the Supplementary Appendix). However, the net reclassification improvement between model 1a and model 1b was 4.3% (P = 0.09), suggesting that spiculation might improve prediction.
We provide spreadsheet calculators for Table 2, models 1b and 2b, at www.brocku.ca/cancerpredictionresearch. These calculators facilitate the calculation of the probability that a pulmonary nodule is lung cancer.
In the screening setting, one of the most difficult decisions is whether CT or another investigation is needed before the next annual low-dose CT study. Current clinical guidelines are complex and vary according to the size and appearance of the nodule. Figure S3 and Table S2 in the Supplementary Appendix show the way in which the prediction accuracies of our models vary according to risk cutoff points for defining a positive screening result. For example, if a threshold of at least a 5% risk of cancer is used with the parsimonious model including spiculation (Table 2, model 1b), the sensitivity, specificity, positive predictive value, and negative predictive value are 71.4%, 95.5%, 18.5% and 99.6%, respectively. Only 5.5% of the nodules would be classified as positive.
This evidence-based, prospective study of two high-risk screening cohorts determined the probability that pulmonary nodules detected by screening low-dose CT would be cancerous; each nodule was prospectively followed for at least 2 years. Our models show excellent predictive accuracy, with AUCs of at least 0.94 in an external validation cohort. Even for lung nodules that were 10 mm or smaller, for which clinical management decisions are the most challenging, the AUC remained excellent (>0.90) in the validation cohort. Our results showed that the relationship between nodule size and cancer was nonlinear. We also confirmed that nodule location in the upper lobes increased the probability of cancer.20 We have provided strong evidence that perifissural nodules represent a minimal risk of lung cancer and probably do not require longitudinal follow-up with CT.21,22 Although variables such as smoking history, body-mass index, and percent of predicted FEV1 identify smokers at risk for lung cancer,8 they were not independently associated with lung cancer in the fully adjusted model. The usefulness of the model in low-risk persons for whom screening is not currently recommended is beyond the scope of our study. Our model also does not apply to persons with hilar or mediastinal lymphadenopathy, for whom further investigations are warranted irrespective of the nodule size.
Previous prediction models for lung nodules were hospital-based or clinic-based and showed a high prevalence of lung cancer — 23 to 75%, as compared with 5.5% in our study.23–25 Some studies were retrospective in design, had smaller sample sizes, and did not evaluate nonlinear effects; in addition, chest radiography was used to detect lung nodules.23,24 These models may not be applicable to screening by means of low-dose CT, since more than 50% of lung cancers detected by low-dose CT are 2 cm in size or smaller and almost one quarter of lung nodules are nonsolid or part-solid nodules, which are rarely visible on a chest radiograph. Split-sample development and validation sets were generally used in the previous studies, and the split-sample approach is inferior to the use of a true external validation set from a unique sample. When their models are validated externally, the accuracies of their predictions appear to be inferior to those of our models.25 Our models are coupled with risk calculators, which make possible the rapid and easy calculation of lung-cancer risk given the characteristics of the person and the nodules.
CT practice guidelines for the follow-up of noncalcified nodules have been developed on the basis of expert opinion and clinic or hospital databases that include large proportions of persons with lung cancer.4,26–28 Currently, the follow-up strategy is based on the size of the largest detected lesion and may vary depending on whether the lung nodule is solid, part-solid, or non-solid.4,27,28 Our study showed that in 20% of the participants, the largest lung nodule was not the one that was malignant or determined to be malignant on follow-up. Although volumetric CT may be useful to characterize volume and mass,29,30 a second CT is required to determine the growth rate or a change in mass, and currently volumetric CT cannot be performed accurately for nonsolid or part-solid lesions. In previously reported studies, including the Dutch–Belgian Randomized Lung Cancer Screening Trial (NELSON), more than 20% of participants who underwent low-dose CT screening required a repeat CT, positron-emission tomographic imaging, or a biopsy procedure within 12 months after their first screening low-dose CT because of suspicious or intermediate lung nodules.1,3,29,30 In approximately 25% of the surgical procedures, the nodule was determined to be benign.1,31 Discrimination of nodules that are lung cancer from those that are not is a primary medical concern. The accurate assessment of risk before additional imaging and volumetric analysis has an important place in lung-cancer screening. The implementation of our nodule risk-prediction models and coupled calculators is expected to improve clinical and public health practice.
Supported by the Terry Fox Research Institute, the Canadian Partnership Against Cancer, and U.S. Public Health Service contracts (N01-CN-85188, U01CA96109, PO1 CA096964-01A1, and N01-CN 35000) from the National Cancer Institute.
Disclosure forms provided by the authors are available with the full text of this article at NEJM.org.