|Home | About | Journals | Submit | Contact Us | Français|
We aim to predict radiological observations using computationally-derived imaging features extracted from computed tomography (CT) images. We created a dataset of 79 CT images containing liver lesions identified and annotated by a radiologist using a controlled vocabulary of 76 semantic terms. Computationally-derived features were extracted describing intensity, texture, shape, and edge sharpness. Traditional logistic regression was compared to L1-regularized logistic regression (LASSO) in order to predict the radiological observations using computational features. The approach was evaluated by leave one out cross-validation. Informative radiological observations such as lesion enhancement, hypervascular attenuation, and homogeneous retention were predicted well by computational features. By exploiting relationships between computational and semantic features, this approach could lead to more accurate and efficient radiology reporting.
Liver lesions stem from a variety of causes and manifest in several variations on CT images; some are benign while others may be malignant, and the different diagnoses demonstrate a variety of visual appearances. The ability to differentiate these lesions efficiently and accurately is important to patient treatment and outcome. Contrast-enhanced CT imaging is the dominant technology used for liver lesion diagnosis 1. This modality takes advantage of the fact that the liver receives blood from two main sources, the portal vein and the hepatic artery. The portal vein supplies about 80% of blood to the liver with the hepatic artery providing the other 20%. Due to varying physiology among liver lesions, differing lesion types may not share the same blood intake proportions as the surrounding liver tissue. Multi-phasic contrast-enhanced imaging takes advantage of this by obtaining images of the liver at multiple time points after injection of contrast agent. This allows for visualization of lesions due to the difference in time between the arrival of contrast agent in the hepatic and portal circulations. This difference causes several distinctive imaging features on contrast-enhanced imaging. As an example, primary liver cancer tumors receive all blood from the hepatic artery, so they contain higher concentrations of contrast agent than surrounding liver parenchyma during the arterial phase 2,3. Arterial-phase contrast-enhanced CT, therefore, may be helpful in finding masses that exhibit malignant tumor characteristics. Other phases may be useful for differentiating tumor types. For example, metastatic tumors appear less dense compared to the normal liver during the portal venous phase.
The difference in density between a lesion and its surrounding tissue at various times after the injection of iodine in a peripheral vein in multi-phasic imaging is called the temporal enhancement pattern. Analysis of a lesion’s temporal enhancement pattern through the different phases of image acquisition helps radiologists to make diagnoses. Unfortunately, the specificity of this method is a function of the size of the lesion and prone to a high false positive rate because several types of liver lesions, including benign ones, have similar manifestations on CT images 4. Human variability has also been shown to be a challenge for lesion differential diagnosis, and automated methods are being investigated to improve diagnosis of these lesions 5.
Recent research has investigated computer-aided methods to support diagnosis by providing a database of annotated images that can be retrieved by similarity6 and further indicates that radiological observations drawn from a controlled vocabulary can lead to accurate diagnoses of liver tumor types from CT images7. These observations can be treated as features of the image derived from human semantic annotations. As a result, we refer to them as semantic features. These semantic features allow radiologists to make their observations consistent, explicit, and machine-accessible. Radiological tools such as the ePAD (formerly iPad) 8 have been implemented with the purpose of recording these annotations in a facile manner.
Computational analysis of these images may overcome this issue by creating quantitative and unbiased descriptors of image features. These computational features can be computed directly from the image’s pixel values, independent of the semantic features. Digital image processing techniques have been used to extract features indicative of lesion attenuation, texture 9,10, edge and shape 11–14. We hypothesize that these computational features can be coupled with methods in statistical machine learning to form a framework to predict semantic features. This could be useful in building a decision support system to aid radiology interpretation.
In this study, we compare the performance of two widely used machine-learning methods to achieve this: logistic regression and L1-regularized logistic regression (LASSO). From these results, we can determine which semantic features are predicted by computational ones, and which computational features are the most useful for this purpose. Knowledge of which semantic features are not predicted well could lead to the development of new computational features for this purpose. Ultimately, prediction of semantic features from computational ones could lead to reduced variability in image interpretation.
With IRB approval, we obtained 79 de-identified CT images of liver lesions in the portal venous phase, including eight types of lesion diagnoses: metastasis, hemangioma (abnormal buildup of blood vessels in the liver), hepatocellular carcinoma, focal nodular hyperplasia (unknown cause), abscess (inflammation), laceration (injury or tear), fat deposition and cyst. For the initial development of image processing algorithms, we chose to focus on the most commonly used phase of imaging, the portal venous phase. Later studies will incorporate unenhanced, arterial, and delayed-phase images, which are important in the diagnosis of many liver lesions. For each scan, the axial slice with the largest lesion area was selected for analysis. A radiologist drew and recorded a Region of Interest (ROI) around the lesion on these images using the freely available OsiriX workstation 15.
A radiologist annotated the ROI with the Electronic Physician’s Annotation Device (ePAD; formerly iPAD) 8. The ePAD system is based on a controlled radiological vocabulary called RadLex to define semantic features 16, and enforces complete description of the required aspects of the visualized lesion. We extended the RadLex terminology to include a broader array of descriptive terms for this study that more comprehensively describe liver lesions. Each annotation resulted in the creation of a binary semantic feature vector of length 76 to indicate positive or negative observations.
Semantic features were not all equally likely and ranged widely in annotation frequency. The entropy of each semantic feature was measured using the binary entropy function in order to determine the distribution of positive versus negative observations. The binary entropy function is defined as:
H(X) ranges from 0 to 1, achieving a maximum when a semantic feature has an equal number of positive and negative observations 17. To avoid degenerate classification cases, only features with entropy greater than 0.4 were considered for classification, resulting in a total 30 semantic features per lesion (Table 1).
The pixel data within the segmented liver lesions were processed to quantify contrast, texture, boundary, and shape (Table 2). Each lesion’s extracted computational features were concatenated into a 431-dimensional feature vector.
2-element feature vector containing: (a) the proportion of pixels with intensity larger than 1100 Hounsfield Units (HU) and (b) the difference in the mean intensity values for pixels inside the lesion and within a 5-pixel rim outside the liver lesion.
349-element feature vector containing: (a) 13-element gray-level histogram-based, including the 9-bin histogram itself, the low frequency coefficients of its 3-level Haar wavelet transform, the abscissa of its peak, entropy, and its variance 9, (b) 12-element Gabor features 10 including the mean of the Gabor energy in the frequency domain over 3 scales and 4 orientations in a total of 12 bins, and (c) 324-element Daubechies features with the dominant sub-band in a 2-scale Daubechies wavelet transform 18.
61-element feature vector computed as follows: (a) We recorded the image intensity values along normals to the lesion contour at multiple points and then fit a sigmoid function to these values. Two parameters for the fitted sigmoid, scale and window, were used to characterize each line segment. The scale measures the difference in intensities outside and inside the lesion, and the window measures the width of the transition from the liver lesion to the surrounding normal liver at the boundary. Two 30-bin histograms for the scale and window parameters were then created to form a 60-element feature vector. (b) We also recorded the number of modes in the histogram of all pixels recorded from each normal 14.
Logistic regression was used to calculate the probability of a computational feature vector representing a positive or negative semantic feature. In order to mitigate the potential for over-fitting due to a large number of predictors (431 computational features) compared to data points (79 lesions), we also used LASSO 21 to weight features given sparsity in the computational feature set. We measured classifier accuracy with leave-one-out cross-validation (LOOCV).
Receiver operating characteristic (ROC) curves were calculated over the probability of a semantic feature being recorded. The resulting areas under the curves (AUC) were measured to quantify the predictive value of these classifiers. Classification accuracy for a probability threshold was measured by:
Thresholds were determined empirically over a range from 0 to 1. These thresholds were then used to calculate the misclassification rate of a classifier for a semantic feature.
All programming was performed in MATLAB. LASSO classification was done using the glmnet package for MATLAB 22.
Figure 1 (a) shows the area under the ROC curve (AUC) and (b) shows the misclassification rate (MCR) of LASSO and logistic regression. Table 3 shows the mean and standard deviations of the AUC and MCR.
We used a two-sample t-test to compare the AUC and MCR of the classifiers. The difference in classifier performance was statistically significant using both AUC and MCR as evaluation metrics (pauc = 7.2*10−10 and pmcr = 6.2*10−4). Both metrics indicated that LASSO was a stronger classifier.
Using the LASSO classifier, several semantic features were shown to be well predicted given computation features. Five were found to have an AUC greater than 0.95: water density, homogeneous retention, non-enhancing, heterogenous, and homogenous.
Conversely, four semantic features were shown to have an AUC under 0.6: multiple lesions 1–5, round, normal perilesional tissue, and ovoid.
There were interesting results with regards to which semantic features were predictable. While we had developed several computational features to quantify shape, two of the most difficult semantic features to predict were round and ovoid. One possible reason for this discrepancy between computational and semantic features is that round and ovoid are inherently subjective terms. Hence, there might exist human variability in such descriptors that cannot be accounted for computationally. This explanation lends credence to further use of computational features for characterizing lesion shape, as there is no variability in our methods.
Another interesting result is our system’s ability to predict seemingly impossible semantic features from one lesion. For example, “multiple lesions > 10” was very well-predicted even though analysis was carried out on only one lesion. Such results might be possible because there is an explanatory disease behind both lesion morphology and multiple lesions. Thus, we can indirectly predict number of lesions based on analysis of a single lesion’s physical characteristics alone.
Computational features were fitted to each semantic feature vector using the LASSO model. Each fit produced a 431-dimensional set of weights for the computational features. Features with large magnitude weights were deemed most informative. The L1 norm regularization in the model imposes a sparse weight selection; most features have zero weight. To quantify the model complexities, we fit a lasso model to each semantic feature group and counted the number of non-zero coefficients. This corresponds to the number of relevant computational features in each semantic feature group. On average, each semantic feature only employed 12.6 (± 4.3) computational features. Moreover, of the entire set of 431 computational features, only 126 computational features had non-zero for any semantic feature vector.
Computational features that consistently had high magnitude weights were considered as characteristic features of these lesions. Characteristic features were found by taking the sum of the absolute value of weights across all semantic features. The 20 features with highest sum of weight magnitudes were categorized according to their associated algorithms. Daubechies Wavelets, Edge Sharpness, Gabor Transform, and the Local Area Invariant Descriptor were found to be the most informative feature groups.
In this study we present a framework for predicting radiological observations of liver lesions using computational image features. We computed a wide array of computational features from CT images of liver lesions and used these features to train logistic regression and LASSO classifiers. We experimented with other classifiers such as k-nearest-neighbor, support vector machines, and latent discriminant analysis. These methods were severely hampered by such high dimension/low sample sized data, either not reaching any model convergence in training phase, or demonstrating poor results. As a result, we only focused on unregularized and regularized logistic regression. We evaluated the classification results using the area under the ROC curve and misclassification accuracy as metrics. From these results we were able to establish correlations between our computational feature set and radiological observations.
Our approach can be used to evaluate the predictive value of computational features as well as to determine radiological observations that are difficult to predict from computational image features. While computational analysis is not likely to replace the trained eyes of a radiologist, this work can be used to develop decision support tools to increase accuracy and efficiency of radiological diagnosis.