|Home | About | Journals | Submit | Contact Us | Français|
The purpose of this study is to build and test a support vector machine (SVM) model to predict for the occurrence of lung radiation-induced Grade 2+ pneumonitis. SVM is a sophisticated statistical technique capable of separating the two categories of patients (with/without pneumonitis) using a boundary defined by a complex hypersurface. Despite the complexity, the SVM boundary is only minimally influenced by outliers that are difficult to separate. By contrast, the simple hyperplane boundary computed by the more commonly used and related linear discriminant analysis method is heavily influenced by outliers. Two SVM models were built using data from 219 patients with lung cancer treated using radiotherapy (34 diagnosed with pneumonitis). One model (SVMall) selected input features from all dose and non-dose factors. For comparison, the other model (SVMdose) selected input features only from lung dose-volume factors. Model predictive ability was evaluated using ten-fold cross-validation and receiver operating characteristics (ROC) analysis. For the model SVMall, the area under the cross-validated ROC curve was 0.76 (sensitivity/specificity =74%/75%). Compared to the corresponding SVMdose area of 0.71 (sensitivity/specificity =68%/68%), the predictive ability of SVMall was improved, indicating that non-dose features are important contributors to separating patients with and without pneumonitis. Among the input features selected by model SVMall, the two with highest importance for predicting lung pneumonitis were: (a) generalized equivalent uniform doses close to the mean lung dose, and (b) chemotherapy prior to radiotherapy. The model SVMall is publicly available via internet access.
Lung radiation-induced pneumonitis is one of the major dose-limiting toxicities associated with thoracic radiotherapy (RT). To obtain the optimal balance between dose coverage to the target volume and minimization of the risk of radiation pneumonitis, it is important to understand the relationship between factors such as radiation dose-volume metrics and the incidence of radiation pneumonitis.
Several studies have suggested that the incidence of radiation pneumonitis depends on dose-volume factors (e.g., V20 (lung volume receiving dose above 20 Gy),1–9 mean lung dose,1,4,5,8,10–13 V30,2,7,10 V15,2 V40,7 and V50,7), as well as non-dose factors (e.g., tumor location,5,14 age,3,15 chemotherapy schedule,13,15 gender16). Most of these studies identify single/multiple factors (features) from univariate/multivariate analysis, but do not consider how these features may be combined into a predictive model. For example, in Lind et al.,3 the univariately correlated features are not, by themselves, strong predictors of radiation pneumonitis.3 It is possible that appropriately combining weakly correlated features into a model may yield much greater predictive accuracy.17 The aim of this work is to develop and test such a model.
We herein use the support vector machine (SVM)18,19 technique to assess predictors of radiation pneumonitis. SVM is a discriminative machine learning technique, based on Vapnik’s structural risk minimization theory,18 which shares some similarities with linear discriminant analysis (LDA).20 Like LDA, SVM uses a boundary to separate data points into two categories. Unlike LDA, which only uses a hyperplane as boundary, SVM is capable of complex hypersurfaces via a kernel function.21 Thus, SVM is more capable of segregating “clusters” of points sharing the same outcome (clusters in the space of the inputs) by using a closed hypersurface. SVM, unlike LDA, also tolerates some points on the wrong side of the boundary. By discarding the effect of these points, SVM reduces the possibility of a few outliers lying on the wrong side from casting an undue influence on the shape and location of the boundary. This feature improves model robustness and generalization. SVM has been successfully applied to problems of text categorization22,23 and face detection.24
In this paper, we present and test an SVM model built using a novel feature selection algorithm. Model testing employed receiver operating characteristic (ROC) analysis3,25,26 and ten-fold cross-validation21 techniques. The importance of each SVM selected input feature was evaluated. This model is publicly available via the internet (http://www.radonc.duke.edu/modules/div_medphys/index.php?id=25).
The study included 235 patients with lung cancer who received three-dimensional conformal radiotherapy at Duke University Medical Center on an Institutional Review Board approved protocol. Radiation-induced symptomatic pneumonitis was diagnosed and graded at follow-up (typically at 1, 3, and then every 3–4 months post-radiotherapy) Pneumonitis was graded from 0 to 4, as follows: Grade 0: no increase in symptoms; Grade 1: symptoms not requiring initiation or increase in steroids and/or oxygen; Grade 2: symptoms requiring initiation or increase in steroids; Grade 3: symptoms requiring oxygen; Grade 4: symptoms requiring assisted ventilation or causing death. Among these patients, 34 were diagnosed with Grade 2+ pneumonitis and 16 were classified as “hard-to-score,”27 i.e., uncertain diagnosis of Grade 2+ pneumonitis. The “hard-to-score” patients were excluded in this analysis.
A total of 93 parameters were collected for each patient, consisting of dose and relevant non-dose factors. The dose factors included mean heart dose, the lung dose-volume histogram (DVH) (percentage of lung volume above dose ranging from 6 to 60 Gy in increments of 2 Gy), and 37 lung generalized equivalent uniform doses (EUD).28 EUD was calculated as
where Vi is the lung volume receiving dose Di, and a is a parameter ranging from 0.4 to 4 in increments of 0.1. Note that for a=1, EUD is equivalent to mean lung dose. Non-dose factors included race, age, gender, tumor stage, tumor location (central or peripheral; upper, middle or lower lobe; right or left lung), chemotherapy schedule (none, pre-RT, concurrent, pre-RT and concurrent, post-RT, or concurrent and post-RT), histology type (squamous cell, adenocarcinoma, non-small-cell, small-cell, large-cell, or other), surgery (yes or no), once or twice daily RT, pre-RT FEV1 (forced expiratory volume in 1 s), FEV1% (as percentage of predicted normal), pre-RT DLCO (carbon monoxide diffusion capacity in lung), and pre-RT DLCO% (as percentage of predicted normal). These factors were evaluated for selection as SVM input features, according to the procedure described later.
The patient data were randomly split into 10 approximately equal-sized groups. Nine groups (training data) were used to train the SVM and the remaining group (cross-validation data) was used as a test to measure the performance of the SVM. Note that feature selection was based only on the training data. The procedure was repeated ten times, with each group, in turn, serving as the test set.
The general idea behind SVMs is to compute an optimal hypersurface (boundary) that maximizes the margin between data points in categories +1 (radiation pneumonitis) and −1 (no radiation pneumonitis). Figure 1 illustrates this concept using a simplified hyperplane. Each data point, x is a vector containing the list of selected features (features selected from the 93 factors available for each patient). In Fig. 1, the hyperplane is denoted by xTβ+ β0=0, where β is the normal vector to the hyperplane, and β0 is an offset parameter (|β0|/β is the closest distance from the origin to the hyperplane). The two parallel margin edges are xTβ+ β0= ±1, corresponding to a margin width of 2/β . Physically, β may be interpreted as the importance weights assigned to the selected features and β0 as the separation threshold of the weighted combination of features (separating cases with and without pneumonitis). Ideally, the space within the margin is devoid of data points. In reality, SVM maximizes the margin [distance between dotted lines in Fig. 1(a)], while permitting some transgression of points into the margin area or the wrong side of the boundary [Fig. 1(b)]. In Fig. 1(b), the slack variable indicates that the point i has transgressed into the margin, and indicates that it is on the wrong side of the separation hyperplane. The sum total of transgressions is restricted to be below a certain limit, thereby allowing small transgressions from several points or, alternately, large outlier transgressions from a few points.
Mathematically, the concept of SVM is described as
where [see Fig. 1(b)], (xi, yi) is the data point for patient i, with yi= ±1 (+1: radiation pneumonitis; −1: no radiation pneumonitis) and vector xi containing the features selected from the 93 available patient factors.
subject to 0 ≤ αi ≤ C and . Parameter C is user defined. It is determined as explained in the next subsection. The coefficients αi are computed through the optimization, and the optimized coefficients are denoted by i.
The solution for β has the form
where i [solution of Eq. (3)] is nonzero only for data points that transgress the margin. The position vectors of these points are the “support vectors” that define the hyperplane normal. Points on the margin boundary are used to solve for i. Thus, the final SVM classifier is
The support vector machine shown in Fig. 1 has a linear boundary. However, for cases with nonlinear boundaries, the original low-dimensional input space can be translated to a higher-dimensional feature space via a basis function21 (see Fig. 2). The basis function does not explicitly appear in the calculation if [Eq. (3)] is replaced by an appropriate kernel function K(xi ,xj).21 This mapping allows the SVM to accommodate nonlinear boundaries that achieve better training-class separation.21
where σ is a user-defined parameter, determined as described in the next subsection. Two other choices for K(xi ,xj) found in SVM literature21 are
The first function [Eq. (7)] can be problematic, since it is unbounded and can potentially lead to numerical instability. The second function [Eq. (8)] has two free parameters and hence is more likely to overfit the model, compared to the radial basis kernel function with one free parameter. Therefore, the radial basis kernel function was used in this work.
In summary, training an SVM is equivalent to solving a quadratic programming problem [Eq. (3)] with replaced by the kernel function K(xi ,xj).
Parameters C [Eq. (3)] and σ [Eq. (6)] were determined prior to cross-validation using grid search29 and nine-fold evaluation within each training set. Of the nine groups constituting each training set (see Fig. 1), eight groups were used to train the SVM using a specific (C,σ) value, and one group (termed training-evaluation) was used to evaluate the SVM for prediction accuracy using ROC analysis. The procedure was repeated nine times for each training set, such that each group was used once for training-evaluation. The parameters (C,σ) were optimized in the grid space of (log10 C, log10 σ2)29 to maximize the area under the training-evaluation ROC curve. Once determined, the optimal (C,σ) values were applied to the entire training set for cross-validated testing.
Each of the 93 patient variables is potentially an input feature for the SVM. Input features were selected using a unique algorithm that progressively built the SVM by sequentially adding/substituting input features.
Input features were selected using the nine-fold training-evaluation scheme described in the previous subsection. For each variable that was a potential input feature, the SVM was trained using eight of the nine training groups and then evaluated on the one remaining training-evaluation group. Each of the nine training groups served as the training-evaluation group, in turn. The variable was added as input feature if the area under the collective ROC curve of the nine training-evaluation groups increased. Similarly, an already selected input feature was replaced by another variable if the training-evaluation ROC area increased. SVM construction was stopped if no new variable was accepted as input feature, after all unselected variables were evaluated through addition/substitution.
In summary, the algorithm is as follows (“AUC” denotes the area under ROC curve for training-evaluation):
When there are less than three input features, Operator1 (substitution) was skipped.
For the purpose of ten-fold cross-validation, the procedure above was repeated ten times, corresponding to each cross-validation group.
To evaluate the effect of non-dose input features on SVM prediction accuracy, two SVM models were built and tested. The first SVM model (SVMall) had input features selected from all dose and non-dose variables, while the second SVM model (SVMdose) had input features selected only from lung dose-volume histogram variables. The cross-validated ROC AUCs from the two models were used to compare generalization capability (larger AUC implies a more accurate model).
The importance of each input feature was evaluated by excluding it from the model, one feature at a time. For each exclusion, parameters C and σ were estimated and the SVM was trained as explained in the previous subsections. With the exclusion of each input feature, the decrement in cross-validated AUC was used to rank the importance of the excluded feature (larger decrement denotes a more important feature).
The SVM algorithm was programmed in-house, using MATLAB (Mathworks, Natick, MA). For the purpose of tenfold cross-validation, 10 SVMs were built, with each SVM used to evaluate one of the test groups. Thus, the resulting model SVMall or SVMdose is an ensemble of ten component SVMs. Note that, since the SVMs were built and trained with slightly different training data (2/9 of the training data are different between any two SVMs), they understandably have different input features and values of the parameters C and σ.
The selected input features are listed in Table I for models SVMall and SVMdose. This table lists, in brackets, the number of component SVMs that selected a specific input feature. For model SVMall, each component SVM selected four input features. Note that the generalized equivalent uniform doses EUD a=1.2, 1.3, and 1.4 are highly correlated to each other. However, they were selected by different component SVMs, i.e., no two of these correlated features were selected by the same component SVM. EUD a=1.3 was selected by seven component SVMs, EUD a=1.4 by two component SVMs, and EUD a=1.2 by 1 component SVM. EUD a=1.2, 1.3, and 1.4 are highly correlated to EUD a=1 (mean lung dose), which frequently appears as a strong predictor of radiation pneumonitis in literature.1,4,5,8,10–13 Thus, mean lung dose appears to be a stronger predictive parameter than any lung DVH Vx metric (Vx metrics are frequently reported in literature1–10). A possible explanation for the stronger predictive potential of generalized EUD metrics is that they contain information on the DVH shape (by volume-weighted combination of dose), as opposed to Vx metrics that only select a single point on the DVH curve.
All ten component SVMs chose chemotherapy-prior-to-RT as an input feature that was predictive for radiation pneumonitis. This factor also appears in literature as associated with the occurrence of pneumonitis.13,15,30 McDonald et al.30 report that, while some chemotherapeutic drugs can induce lung injury such as pneumonitis, chemotherapeutic drugs can also enhance radiation-induced lung injury. Other input features, ranked in order from most-to-least commonly selected, were: Tumor location (central or peripheral) selected by nine component SVMs, gender (male or female) by eight component SVMs, histology (adenocarcinoma or not) by two component SVMs, and histology (small cell or not) by one component SVM. Peripheral tumor location, female sex, and small cell/adenocarcinima histology were associated with increased risk of pneumonitis. Female sex has been implicated in prior studies,16,31 whereas histology and peripheral tumor location have not (inferior tumor location has previously been identified as correlated with pneumonitis14). However, it is not surprising that the SVM algorithm sometimes selects input features with weak univariate correlation, since multiple such variables could synergistically interact (when combined) to provide high correlation (“…, a variable that is completely useless by itself can provide a significant performance improvement when taken with others”17). Indeed, one of the benefits of this approach is the ability to consider such complex interactions.
For model SVMdose, each component SVM selected two input features. Eight component SVMs selected EUD a =1.3, while highly correlated EUD a=1.4 and EUD a=1.1 were each selected by one component SVM each. Seven SVMs selected V50 and three SVMs selected the closely related V48. The selection of only two input features from the lung DVH is likely because lung DVH variables tend to be highly correlated to each other.
While the concept of a support vector machine is straight-forward, “incorrect” values of the parameters C [Eq. (3)] and σ [Eq. (6)] can lead to poor predictive accuracy. These parameters control the tradeoff between the model overfitting (the model is too complex—it fits the signal as well as the noise) and underfitting (the model is not complex enough to fit the signal). Thus, it is critical to accurately estimate their values for good generalization.
Parameter C controls the complexity of the hypersurface that separates the two categories. For small values of C, the separation hypersurface created by the SVM algorithm can be insufficiently complex, resulting in underfitting. As C increases, the SVM algorithm increases the complexity of the separation hypersurface to correctly classify greater numbers of data points. Thus, large values of C can lead to overfitting. The effect of parameter C is shown in Fig. 3. In Fig. 3, the AUC for training-evaluation is shown as a function of log10 C, for fixed σ (log10 σ2=−1.5). For log10 C < 4, the SVM underfits the data. The overfitting condition is not shown here.
Parameter σ is the width of the radial basis function, which controls the number of support vectors. For smaller σ, the SVM uses more data points as support vectors, leading to overfitting. Conversely, for larger σ, the SVM has fewer support vectors, leading to underfitting. The behavior of σ is shown in Fig. 4 (parameter C was fixed at log10 C=5). The low values of AUC for small σ represent underfitting. Underfitting is reduced with increasing σ, up to log10 σ2=−2. Overfitting is manifested as a sharp drop in the area under the training-evaluation ROC curve, beyond log10 σ2=−0.5.
The optimal values of parameters C and σ (optimized using grid search29 to maximize the training-evaluation AUC) are not a single point, but rather an area in the space of (log10 C, log10 σ2). As seen in the examples of Fig. 4 and Fig 5, C is optimal for log10 C > 4 when log10 σ2=−1.5 (Fig. 3), and σ is optimal for log10 σ2=[−2,−0.5] when log10 C=8 (Fig. 4).
The ROC analysis results for ten-fold cross-validated testing are shown in Fig. 5 for the SVM using dose and non-dose variables (SVMall), and in Fig. 6 for SVM using only lung DVH variables (SVMdose). For model SVMall, the area under the ROC curve (AUC) was 0.76 (sensitivity=74%, specificity=75%), while for model SVMdose, the AUC was 0.71 (sensitivity=68%, specificity=68%). The difference between these two areas suggests that the predictive ability of model SVMall is better than that of model SVMdose and that the addition of non-dose features can improve the generalization capability of the SVM model.
The importance of each input feature used in model SVMall was evaluated and ranked, from highest to lowest, as follows: dose metrics closely related to mean lung dose (EUD a=1.2, 1.3, and 1.4), chemotherapy before radio-therapy (yes or no), tumor position (central or peripheral), gender (male or female), histology (adenocarcinoma or not), and histology (small-cell or not). The ROC analysis is summarized in Table II. The exclusion of dose metrics closely related to mean lung dose resulted in a large AUC drop, from 0.76 to 0.57. This suggests, in agreement with other studies1,4,5,8,10–13 that mean lung dose metrics are an important factor in predicting lung pneumonitis. The second most important feature is chemotherapy prior to radiotherapy (AUC decrement of 0.09). The use of chemotherapy, either prior to or concurrent with RT, has been suggested in other studies to increase the risk of pneumonitis.30 Even though other input features record only small drops in the AUC, they nevertheless help to improve model generalization. It is understandable that histology (adenocarcinoma or not) and histology (small-cell or not) would have minimal impact on AUC reduction, since they were only selected by two component SVMs and one component SVM, respectively (see Table I).
To evaluate the robustness of the cross-validated results from model SVMall to patient assignment (i.e., sensitivity of the results to randomization of patients into the ten different groups), the data were randomly split 100 times into ten groups. Thus, each time, the composition of patients within the ten groups changed. Each time, model SVMall was trained and tested with ten-fold cross-validation. The cross-validated ROC areas from the 100 randomizations had mean=0.74 (range 0.71–0.77) and standard deviation=0.03. The small variance implies that the dataset is of adequate size and that model SVMall is robust.
The model SVMall for prospective use is available for download from http://www.radonc.duke.edu/modules/div_medphys/index.php?id=25. The required input features are shown in the left column of Table I. The input file (example available on website) is required to include the entire lung DVH, chemotherapy prior to RT (yes or no), tumor position (central or peripheral), gender (male or female), histology (adenocarcinoma or not), and histology (small-cell or not). Missing variables are indicated as negative values in the input file. The program internally computes two of the input features from the lung DVH: EUDs with a=1.2, 1.3, and 1.4. The classification result is an average of outputs from the ten component SVMs comprising SVMall. The model outputs are two sets of metrics: a discriminant value that is a measure of the extent of injury (>0 indicates predicted pneumonitis,<0 indicates no predicted pneumonitis), and the number of patients in the Duke training database with higher discriminant than the prospectively tested patient. The latter value ranks the prospectively evaluated patient in the context of the Duke population.
In this work, the support vector machine (SVM) algorithm was investigated to predict lung radiation-induced pneumonitis. Results indicate that the SVM model is a powerful, yet robust, predictor. The SVM model constructed with dose and non-dose input features yielded a ten-fold cross validated ROC area of 0.76 with sensitivity and specificity of 74% and 75%, respectively. Among the selected input features, dose metrics closely related to mean lung dose were most influential. The SVM model constructed in this work is available for public use via internet access.
This work was supported by Grant Nos. NIH R01 CA 115748 and NIH R01 CA69579.