We are developing a computer-aided diagnosis (CAD) system to classify malignant and benign lung nodules found on CT scans. A fully automated system was designed to segment the nodule from its surrounding structured background in a local volume of interest (VOI) and to extract image features for classification. Image segmentation was performed with a three-dimensional (3D) active contour (AC) method. A data set of 96 lung nodules (44 malignant, 52 benign) from 58 patients was used in this study. The 3D AC model is based on two-dimensional AC with the addition of three new energy components to take advantage of 3D information: (1) 3D gradient, which guides the active contour to seek the object surface, (2) 3D curvature, which imposes a smoothness constraint in the z direction, and (3) mask energy, which penalizes contours that grow beyond the pleura or thoracic wall. The search for the best energy weights in the 3D AC model was guided by a simplex optimization method. Morphological and gray-level features were extracted from the segmented nodule. The rubber band straightening transform (RBST) was applied to the shell of voxels surrounding the nodule. Texture features based on run-length statistics were extracted from the RBST image. A linear discriminant analysis classifier with stepwise feature selection was designed using a second simplex optimization to select the most effective features. Leave-one-case-out resampling was used to train and test the CAD system. The system achieved a test area under the receiver operating characteristic curve (Az) of 0.83±0.04. Our preliminary results indicate that use of the 3D AC model and the 3D texture features surrounding the nodule is a promising approach to the segmentation and classification of lung nodules with CAD. The segmentation performance of the 3D AC model trained with our data set was evaluated with 23 nodules available in the Lung Image Database Consortium (LIDC). The lung nodule volumes segmented by the 3D AC model for best classification were generally larger than those outlined by the LIDC radiologists using visual judgment of nodule boundaries.
computer-aided diagnosis; active contour model; object segmentation; classification; texture analysis; computed tomography (CT); malignancy; pulmonary nodule
Rationale and Objectives
To investigate the effects of choosing between different metrics in estimating the size of pulmonary nodules as a factor both of nodule characterization and of performance of computer aided detection systems, since the latters are always qualified with respect to a given size range of nodules.
Materials and Methods
This study used 265 whole-lung CT scans documented by the Lung Image Database Consortium using their protocol for nodule evaluation. Each inspected lesion was reviewed independently by four experienced radiologists who provided boundary markings for nodules larger than 3 mm. Four size metrics, based on the boundary markings, were considered: a uni-dimensional and two bi-dimensional measures on a single image slice and a volumetric measurement based on all the image slices. The radiologist boundaries were processed and those with four markings were analyzed to characterize the inter-radiologist variation, while those with at least one marking were used to examine the difference between the metrics.
The processing of the annotations found 127 nodules marked by all of the four radiologists and an extended set of 518 nodules each having at least one observation with three-dimensional sizes ranging from 2.03 to 29.4 mm (average 7.05 mm, median 5.71 mm). A very high inter-observer variation was observed for all these metrics: 95% of estimated standard deviations were in the following ranges [0.49, 1.25], [0.67, 2.55], [0.78, 2.11], and [0.96, 2.69] for the three-dimensional, the uni-dimensional, and the two bi-dimensional size metrics respectively (in mm). Also a very large difference among the metrics was observed: 0.95 probability-coverage region widths for the volume estimation conditional on uni-dimensional, and the two bi-dimensional size measurements of 10mm were 7.32, 7.72, and 6.29 mm respectively.
The selection of data subsets for performance evaluation is highly impacted by the size metric choice. The LIDC plans to include a single size measure for each nodule in its database. This metric is not intended as a gold standard for nodule size; rather, it is intended to facilitate the selection of unique repeatable size limited nodule subsets.
Quantitative image analysis; X-ray CT; Detection; Lung nodule annotation; Size metrics
Ideally, an image should be reported and interpreted in the same way (e.g., the same perceived likelihood of malignancy) or similarly by any two radiologists; however, as much research has demonstrated, this is not often the case. Various efforts have made an attempt at tackling the problem of reducing the variability in radiologists’ interpretations of images. The Lung Image Database Consortium (LIDC) has provided a database of lung nodule images and associated radiologist ratings in an effort to provide images to aid in the analysis of computer-aided tools. Likewise, the Radiological Society of North America has developed a radiological lexicon called RadLex. As such, the goal of this paper is to investigate the feasibility of associating LIDC characteristics and terminology with RadLex terminology. If matches between LIDC characteristics and RadLex terms are found, probabilistic models based on image features may be used as decision-based rules to predict if an image or lung nodule could be
characterized or classified as an associated RadLex term. The results of this study were matches for 25 (74%) out of 34 LIDC terms in RadLex. This suggests that LIDC characteristics and associated rating terminology may be better conceptualized or reduced to produce even more matches with RadLex. Ultimately, the goal is to identify and establish a more standardized rating system and terminology to reduce the subjective variability between radiologist annotations. A standardized rating system can then be utilized by future researchers to develop automatic annotation models and tools for computer-aided decision systems.
Chest CT; digital imaging; image data; image interpretation; imaging informatics; lung; radiographic image interpretation; computer-assisted; reporting; RadLex; semantic; LIDC
Rationale and Objectives
Computer-aided diagnostic (CAD) systems fundamentally require the opinions of expert human observers to establish “truth” for algorithm development, training, and testing. The integrity of this “truth,” however, must be established before investigators commit to this “gold standard” as the basis for their research. The purpose of this study was to develop a quality assurance (QA) model as an integral component of the “truth” collection process concerning the location and spatial extent of lung nodules observed on computed tomography (CT) scans to be included in the Lung Image Database Consortium (LIDC) public database.
Materials and Methods
One hundred CT scans were interpreted by four radiologists through a two-phase process. For the first of these reads (the “blinded read phase”), radiologists independently identified and annotated lesions, assigning each to one of three categories: “nodule ≥ 3mm,” “nodule < 3mm,” or “non-nodule ≥ 3mm.” For the second read (the “unblinded read phase”), the same radiologists independently evaluated the same CT scans but with all of the annotations from the previously performed blinded reads presented; each radiologist could add marks, edit or delete their own marks, change the lesion category of their own marks, or leave their marks unchanged. The post-unblinded-read set of marks was grouped into discrete nodules and subjected to the QA process, which consisted of (1) identification of potential errors introduced during the complete image annotation process (such as two marks on what appears to be a single lesion or an incomplete nodule contour) and (2) correction of those errors. Seven categories of potential error were defined; any nodule with a mark that satisfied the criterion for one of these categories was referred to the radiologist who assigned that mark for either correction or confirmation that the mark was intentional.
A total of 105 QA issues were identified across 45 (45.0%) of the 100 CT scans. Radiologist review resulted in modifications to 101 (96.2%) of these potential errors. Twenty-one lesions erroneously marked as lung nodules after the unblinded reads had this designation removed through the QA process.
The establishment of “truth” must incorporate a QA process to guarantee the integrity of the datasets that will provide the basis for the development, training, and testing of CAD systems.
lung nodule; computed tomography (CT); thoracic imaging; database construction; computer-aided diagnosis (CAD); annnotation; quality assurance (QA)
The purpose of this study was to develop a new method for automated lung nodule detection in serial section CT images with using the characteristics of the 3D appearance of the nodules that distinguish themselves from the vessels.
Materials and Methods
Lung nodules were detected in four steps. First, to reduce the number of region of interests (ROIs) and the computation time, the lung regions of the CTs were segmented using Genetic Cellular Neural Networks (G-CNN). Then, for each lung region, ROIs were specified with using the 8 directional search; +1 or -1 values were assigned to each voxel. The 3D ROI image was obtained by combining all the 2-Dimensional (2D) ROI images. A 3D template was created to find the nodule-like structures on the 3D ROI image. Convolution of the 3D ROI image with the proposed template strengthens the shapes that are similar to those of the template and it weakens the other ones. Finally, fuzzy rule based thresholding was applied and the ROI's were found. To test the system's efficiency, we used 16 cases with a total of 425 slices, which were taken from the Lung Image Database Consortium (LIDC) dataset.
The computer aided diagnosis (CAD) system achieved 100% sensitivity with 13.375 FPs per case when the nodule thickness was greater than or equal to 5.625 mm.
Our results indicate that the detection performance of our algorithm is satisfactory, and this may well improve the performance of computer-aided detection of lung nodules.
Computer aided lung nodule detection; ROI specification, Genetic algorithm; Cellular neural networks; Fuzzy logic, 3D template matching
Lung CAD systems require the ability to classify a variety of pulmonary structures as part of the diagnostic process. The purpose of this work was to develop a methodology for fully automated voxel-by-voxel classification of airways, fissures, nodules, and vessels from chest CT images using a single feature set and classification method. Twenty-nine thin section CT scans were obtained from the Lung Image Database Consortium (LIDC). Multiple radiologists labeled voxels corresponding to the following structures: airways (trachea to 6th generation), major and minor lobar fissures, nodules, and vessels (hilum to peripheral), and normal lung parenchyma. The labeled data was used in conjunction with a supervised machine learning approach (AdaBoost) to train a set of ensemble classifiers. Each ensemble classifier was trained to detect voxels part of a specific structure (either airway, fissure, nodule, vessel, or parenchyma). The feature set consisted of voxel attenuation and a small number of features based on the eigenvalues of the Hessian matrix (used to differentiate structures by shape). When each ensemble classifier was composed of 20 weak classifiers, the AUC values for the airway, fissure, nodule, vessel, and parenchyma classifiers were 0.984 ± 0.011, 0.949 ± 0.009, 0.945 ± 0.018, 0.953 ± 0.016, and 0.931± 0.015 respectively. The strong results suggest that this could be an effective input to higher-level anatomical based segmentation models with the potential to improve CAD performance.
AdaBoost; Bronchovascular segmentation; Computed tomography; Computer-aided diagnosis
In 1979 Haralick famously introduced a method for analyzing the texture of an image: a set of statistics extracted from the co-occurrence matrix. In this paper we investigate novel sets of texture descriptors extracted from the co-occurrence matrix; in addition, we compare and combine different strategies for extending these descriptors. The following approaches are compared: the standard approach proposed by Haralick, two methods that consider the co-occurrence matrix as a three-dimensional shape, a gray-level run-length set of features and the direct use of the co-occurrence matrix projected onto a lower dimensional subspace by principal component analysis. Texture descriptors are extracted from the co-occurrence matrix evaluated at multiple scales. Moreover, the descriptors are extracted not only from the entire co-occurrence matrix but also from subwindows. The resulting texture descriptors are used to train a support vector machine and ensembles. Results show that our novel extraction methods improve the performance of standard methods. We validate our approach across six medical datasets representing different image classification problems using the Wilcoxon signed rank test. The source code used for the approaches tested in this paper will be available at: http://www.dei.unipd.it/wdyn/?IDsezione=3314&IDgruppo_pass=124&preview=.
A fully automated and three-dimensional (3D) segmentation method for the identification of the pulmonary parenchyma in thorax X-ray computed tomography (CT) datasets is proposed. It is meant to be used as pre-processing step in the computer-assisted detection (CAD) system for malignant lung nodule detection that is being developed by the Medical Applications in a Grid Infrastructure Connection (MAGIC-5) Project. In this new approach the segmentation of the external airways (trachea and bronchi), is obtained by 3D region growing with wavefront simulation and suitable stop conditions, thus allowing an accurate handling of the hilar region, notoriously difficult to be segmented. Particular attention was also devoted to checking and solving the problem of the apparent ‘fusion’ between the lungs, caused by partial-volume effects, while 3D morphology operations ensure the accurate inclusion of all the nodules (internal, pleural, and vascular) in the segmented volume. The new algorithm was initially developed and tested on a dataset of 130 CT scans from the Italung-CT trial, and was then applied to the ANODE09-competition images (55 scans) and to the LIDC database (84 scans), giving very satisfactory results. In particular, the lung contour was adequately located in 96% of the CT scans, with incorrect segmentation of the external airways in the remaining cases. Segmentation metrics were calculated that quantitatively express the consistency between automatic and manual segmentations: the mean overlap degree of the segmentation masks is 0.96 ± 0.02, and the mean and the maximum distance between the mask borders (averaged on the whole dataset) are 0.74 ± 0.05 and 4.5 ± 1.5, respectively, which confirms that the automatic segmentations quite correctly reproduce the borders traced by the radiologist. Moreover, no tissue containing internal and pleural nodules was removed in the segmentation process, so that this method proved to be fit for the use in the framework of a CAD system. Finally, in the comparison with a two-dimensional segmentation procedure, inter-slice smoothness was calculated, showing that the masks created by the 3D algorithm are significantly smoother than those calculated by the 2D-only procedure.
CAD; image segmentation; lung nodules; region growing; grid; 3D imaging; biomedical image analysis
This paper describes part of content-based image retrieval (CBIR) system that has been developed for mammograms. Details are presented of methods implemented to derive measures of similarity based upon structural characteristics and distributions of density of the fibroglandular tissue, as well as the anatomical size and shape of the breast region as seen on the mammogram. Well-known features related to shape, size, and texture (statistics of the gray-level histogram, Haralick’s texture features, and moment-based features) were applied, as well as less-explored features based in the Radon domain and granulometric measures. The Kohonen self-organizing map (SOM) neural network was used to perform the retrieval operation. Performance evaluation was done using precision and recall curves obtained from comparison between the query and retrieved images. The proposed methodology was tested with 1,080 mammograms, including craniocaudal and mediolateral-oblique views. Precision rates obtained are in the range from 79% to 83% considering the total image set. Considering the first 50% of the retrieved mages, the precision rates are in the range from 78% to 83%; the rates are in the range from 79% to 86% considering the first 25% of the retrieved images. Results obtained indicate the potential of the implemented methodology to serve as a part of a CBIR system for mammography.
Mammography; contend-based image retrieval; Kohonen self-organizing map; texture features; granulometric measures; radon transform domain; breast density
There are lots of work being done to develop computer-assisted diagnosis and detection (CAD) technologies and systems to improve the diagnostic quality for pulmonary nodules. Another way to improve accuracy of diagnosis on new images is to recall or find images with similar features from archived historical images which already have confirmed diagnostic results, and the content-based image retrieval (CBIR) technology has been proposed for this purpose. In this paper, we present a method to find and select texture features of solitary pulmonary nodules (SPNs) detected by computed tomography (CT) and evaluate the performance of support vector machine (SVM)-based classifiers in differentiating benign from malignant SPNs. Seventy-seven biopsy-confirmed CT cases of SPNs were included in this study. A total of 67 features were extracted by a feature extraction procedure, and around 25 features were finally selected after 300 genetic generations. We constructed the SVM-based classifier with the selected features and evaluated the performance of the classifier by comparing the classification results of the SVM-based classifier with six senior radiologists′ observations. The evaluation results not only showed that most of the selected features are characteristics frequently considered by radiologists and used in CAD analyses previously reported in classifying SPNs, but also indicated that some newly found features have important contribution in differentiating benign from malignant SPNs in SVM-based feature space. The results of this research can be used to build the highly efficient feature index of a CBIR system for CT images with pulmonary nodules.
Feature selection; content-based image retrieval; classification; CT images; lung diseases
We present an image retrieval framework based on automatic query expansion in a concept feature space by generalizing the vector space model of information retrieval. In this framework, images are represented by vectors of weighted concepts similar to the keyword-based representation used in text retrieval. To generate the concept vocabularies, a statistical model is built by utilizing Support Vector Machine (SVM)-based classification techniques. The images are represented as “bag of concepts” that comprise perceptually and/or semantically distinguishable color and texture patches from local image regions in a multi-dimensional feature space. To explore the correlation between the concepts and overcome the assumption of feature independence in this model, we propose query expansion techniques in the image domain from a new perspective based on both local and global analysis. For the local analysis, the correlations between the concepts based on the co-occurrence pattern, and the metrical constraints based on the neighborhood proximity between the concepts in encoded images, are analyzed by considering local feedback information. We also analyze the concept similarities in the collection as a whole in the form of a similarity thesaurus and propose an efficient query expansion based on the global analysis. The experimental results on a photographic collection of natural scenes and a biomedical database of different imaging modalities demonstrate the effectiveness of the proposed framework in terms of precision and recall.
Image Retrieval; Vector Space Model; Support Vector Machine; Relevance Feedback; Query Expansion
Rationale and Objectives
To retrospectively investigate the effect of a computer aided detection (CAD) system on radiologists’ performance for detecting small pulmonary nodules in CT examinations, with a panel of expert radiologists serving as the reference standard.
Materials and Methods
Institutional review board approval was obtained. Our data set contained 52 CT examinations collected by the Lung Image Database Consortium, and 33 from our institution. All CTs were read by multiple expert thoracic radiologists to identify the reference standard for detection. Six other thoracic radiologists read the CT examinations first without, and then with CAD. Performance was evaluated using free-response receiver operating characteristics (FROC) and the jackknife FROC analysis methods (JAFROC) for nodules above different diameter thresholds.
241 nodules, ranging in size from 3.0 to 18.6 mm (mean 5.3 mm) were identified as the reference standard. At diameter thresholds of 3, 4, 5, and 6 mm, the CAD system had a sensitivity of 54%, 64%, 68%, and 76%, respectively, with an average of 5.6 false-positives (FPs) per scan. Without CAD, the average figures-of-merit (FOMs) for the six radiologists, obtained from JAFROC analysis, were 0.661, 0.729, 0.793 and 0.838 for the same nodule diameter thresholds, respectively. With CAD, the corresponding average FOMs improved to 0.705, 0.763, 0.810 and 0.862, respectively. The improvement achieved statistical significance for nodules at the 3 and 4 mm thresholds (p=0.002 and 0.020, respectively), and did not achieve significance at 5 and 6 mm (p=0.18 and 0.13, respectively). At a nodule diameter threshold of 3 mm, the radiologists’ average sensitivity and FP rate were 0.56 and 0.67, respectively, without CAD, and 0.67 and 0.78 with CAD.
CAD improves thoracic radiologists’ performance for detecting pulmonary nodules under 5 mm on CT examinations, which are often overlooked by visual inspection alone.
Lung Nodule; CT; Computer-aided detection
Rationale and Objectives
Integral to the mission of the National Institutes of Health–sponsored Lung Imaging Database Consortium is the accurate definition of the spatial location of pulmonary nodules. Because the majority of small lung nodules are not resected, a reference standard from histopathology is generally unavailable. Thus assessing the source of variability in defining the spatial location of lung nodules by expert radiologists using different software tools as an alternative form of truth is necessary.
Materials and Methods
The relative differences in performance of six radiologists each applying three annotation methods to the task of defining the spatial extent of 23 different lung nodules were evaluated. The variability of radiologists’ spatial definitions for a nodule was measured using both volumes and probability maps (p-map). Results were analyzed using a linear mixed-effects model that included nested random effects.
Across the combination of all nodules, volume and p-map model parameters were found to be significant at P < .05 for all methods, all radiologists, and all second-order interactions except one. The radiologist and methods variables accounted for 15% and 3.5% of the total p-map variance, respectively, and 40.4% and 31.1% of the total volume variance, respectively.
Radiologists represent the major source of variance as compared with drawing tools independent of drawing metric used. Although the random noise component is larger for the p-map analysis than for volume estimation, the p-map analysis appears to have more power to detect differences in radiologist-method combinations. The standard deviation of the volume measurement task appears to be proportional to nodule volume.
LIDC drawing experiment; lung nodule annotation; edge mask; p-map; volume; linear mixed-effects model
The aim of this study is to investigate the relationship between16-slice spiral CT perfusion imaging and tumor angiogenesis and VEGF (vascular endothelial growth factor) expression in patients with benign and malignant pulmonary nodules, and differential diagnosis between benign and malignant pulmonary nodules.
Sixty-four patients with benign and malignant pulmonary nodules underwent 16-slice spiral CT perfusion imaging. The CT perfusion imaging was analyzed for TDC (time density curve), perfusion parametric maps, and the respective perfusion parameters. Immunohistochemical findings of MVD (microvessel density) measurement and VEGF expression was evaluated.
The shape of the TDC of peripheral lung cancer was similar to those of inflammatory nodule. PH (peak height), PHpm/PHa (peak height ratio of pulmonary nodule to aorta), BF (blood flow), BV (blood volume) value of peripheral lung cancer and inflammatory nodule were not statistically significant (all P > 0.05). Both showed significantly higher PH, PHpm/PHa, BF, BV value than those of benign nodule (all P < 0.05). Peripheral lung cancer showed significantly higher PS (permeability surface) value than that of inflammatory nodule and benign nodule (all P < 0.05). BV, BF, PS, MTT, PH, PHpm/PHa, and MVD among three groups of peripheral lung cancers were not significantly (all P > 0.05). In the case of adenocarcinoma, BV, BF, PS, PHpm/PHa, and MVD between poorly and well differentiation and between poorly and moderately differentiation were statistically significant (all P < 0.05). The peripheral lung cancers with VEGF positive expression showed significantly higher PH, PHpm/PHa, BF, BV, PS, and MVD value than those of the peripheral lung cancer with VEGF negative expression, and than those of benign nodule with VEGF positive expression (all P < 0.05). When investigating VEGF negative expression, it is found that PH, PHpm/PHa, and MVD of inflammatory nodule were significantly higher than those of peripheral lung cancer, PS of inflammatory nodule were significantly lower than that of peripheral lung cancer (all P < 0.05). PH, PHpm/PHa, BF, and BV of benign nodule were significantly lower than those of inflammatory nodule (all P < 0.05), rather than PS and MTT (mean transit time) (all P > 0.05). PH, PHpm/PHa, BV, and PS of benign nodule were significantly lower than those of peripheral lung cancer (all P < 0.05). In the case of VEGF positive expression, MVD was positively correlated with PH, PHpm/PHa, BF, BV, and PS of peripheral lung cancer and PS of benign nodule (all P < 0.05).
Multi-slice spiral CT perfusion imaging closely correlated with tumor angiogenesis and reflected MVD measurement and VEGF expression. It provided not only a non-invasive method of quantitative assessment for blood flow patterns of peripheral pulmonary nodules but also an applicable diagnostic method for peripheral pulmonary nodules.
With the increasing use of images in disease research, education, and clinical medicine, the need for methods that effectively archive, query, and retrieve these images by their content is underscored. This paper describes the implementation of a Web-based retrieval system called SPIRS (Spine Pathology & Image Retrieval System), which permits exploration of a large biomedical database of digitized spine x-ray images and data from a national health survey using a combination of visual and textual queries.
SPIRS is a generalizable framework that consists of four components: a client applet, a gateway, an indexing and retrieval system, and a database of images and associated text data. The prototype system is demonstrated using text and imaging data collected as part of the second U.S. National Health and Nutrition Examination Survey (NHANES II). Users search the image data by providing a sketch of the vertebral outline or selecting an example vertebral image and some relevant text parameters. Pertinent pathology on the image/sketch can be annotated and weighted to indicate importance.
During the course of development, we explored different algorithms to perform functions such as segmentation, indexing, and retrieval. Each algorithm was tested individually and then implemented as part of SPIRS. To evaluate the overall system, we first tested the system’s ability to return similar vertebral shapes from the database given a query shape. Initial evaluations using visual queries only (no text) have shown that the system achieves up to 68% accuracy in finding images in the database that exhibit similar abnormality type and severity. Relevance feedback mechanisms have been shown to increase accuracy by an additional 22% after three iterations. While we primarily demonstrate this system in the context of retrieving vertebral shape, our framework has also been adapted to search a collection of 100,000 uterine cervix images to study the progression of cervical cancer.
SPIRS is automated, easily accessible, and integratable with other complementary information retrieval systems. The system supports the ability for users to intuitively query large amounts of imaging data by providing visual examples and text keywords and has beneficial implications in the areas of research, education, and patient care.
Medical informatics applications; Information storage and retrieval; Content-based image retrieval; Visual access methods; Web-based systems
We have been developing a computer-aided diagnostic (CAD) scheme for lung nodule detection in order to assist radiologists in the detection of lung cancer in thin-section computed tomography (CT) images. Our database consisted of 117 thin-section CT scans with 153 nodules, obtained from a lung cancer screening program at a Japanese university (85 scans, 91 nodules) and from clinical work at an American university (32 scans, 62 nodules). The database included nodules of different sizes (4-28 mm, mean 10.2 mm), shapes, and patterns (solid and ground-glass opacity (GGO)). Our CAD scheme consisted of modules for lung segmentation, selective nodule enhancement, initial nodule detection, feature extraction, and classification. The selective nodule enhancement filter was a key technique for significant enhancement of nodules and suppression of normal anatomic structures such as blood vessels, which are the main sources of false positives. Use of an automated rule-based classifier for reduction of false positives was another key technique; it resulted in a minimized overtraining effect and an improved classification performance. We employed a case-based four-fold cross-validation testing method for evaluation of the performance levels of our computerized detection scheme. Our CAD scheme achieved an overall sensitivity of 86% (small: 76%, medium-sized: 94%, large: 95%; solid: 86%, mixed GGO: 89%, pure GGO: 81%) with 6.6 false positives per scan; an overall sensitivity of 81% (small: 69%, medium-sized: 91%, large: 91%; solid: 79%, mixed GGO: 88%, pure GGO: 81%) with 3.3 false positives per scan; and an overall sensitivity of 75% (small: 60%, medium-sized: 88%, large: 87%; solid: 70%, mixed GGO: 87%, pure GGO: 81%) with 1.6 false positives per scan. The experimental results indicate that our CAD scheme with its two key techniques can achieve a relatively high performance for nodules presenting large variations in size, shape, and pattern.
nodule detection; computer-aided diagnosis; CAD; CT scan; rule-based classifier
Communication of critical results from diagnostic procedures between caregivers is a Joint Commission national patient safety goal. Evaluating critical result communication often requires manual analysis of voluminous data, especially when reviewing unstructured textual results of radiologic findings. Information retrieval (IR) tools can facilitate this process by enabling automated retrieval of radiology reports that cite critical imaging findings. However, IR tools that have been developed for one disease or imaging modality often need substantial reconfiguration before they can be utilized for another disease entity.
This paper: 1) describes the process of customizing two Natural Language Processing (NLP) and Information Retrieval/Extraction applications – an open-source toolkit, A Nearly New Information Extraction system (ANNIE); and an application developed in-house, Information for Searching Content with an Ontology-Utilizing Toolkit (iSCOUT) – to illustrate the varying levels of customization required for different disease entities and; 2) evaluates each application’s performance in identifying and retrieving radiology reports citing critical imaging findings for three distinct diseases, pulmonary nodule, pneumothorax, and pulmonary embolus.
Both applications can be utilized for retrieval. iSCOUT and ANNIE had precision values between 0.90-0.98 and recall values between 0.79 and 0.94. ANNIE had consistently higher precision but required more customization.
Understanding the customizations involved in utilizing NLP applications for various diseases will enable users to select the most suitable tool for specific tasks.
Critical imaging findings; critical test results; document retrieval; radiology report retrieval.
Searching for relevant knowledge across heterogeneous geospatial databases requires an extensive knowledge of the semantic meaning of images, a keen eye for visual patterns, and efficient strategies for collecting and analyzing data with minimal human intervention. In this paper, we present our recently developed content-based multimodal Geospatial Information Retrieval and Indexing System (GeoIRIS) which includes automatic feature extraction, visual content mining from large-scale image databases, and high-dimensional database indexing for fast retrieval. Using these underpinnings, we have developed techniques for complex queries that merge information from heterogeneous geospatial databases, retrievals of objects based on shape and visual characteristics, analysis of multiobject relationships for the retrieval of objects in specific spatial configurations, and semantic models to link low-level image features with high-level visual descriptors. GeoIRIS brings this diverse set of technologies together into a coherent system with an aim of allowing image analysts to more rapidly identify relevant imagery. GeoIRIS is able to answer analysts’ questions in seconds, such as “given a query image, show me database satellite images that have similar objects and spatial relationship that are within a certain radius of a landmark.”
Geospatial intelligence; image database; information mining
High-content screening (HCS) has become a powerful tool for drug discovery. However, the discovery of drugs targeting neurons is still hampered by the inability to accurately identify and quantify the phenotypic changes of multiple neurons in a single image (named multi-neuron image) of a high-content screen. Therefore, it is desirable to develop an automated image analysis method for analyzing multi-neuron images.
We propose an automated analysis method with novel descriptors of neuromorphology features for analyzing HCS-based multi-neuron images, called HCS-neurons. To observe multiple phenotypic changes of neurons, we propose two kinds of descriptors which are neuron feature descriptor (NFD) of 13 neuromorphology features, e.g., neurite length, and generic feature descriptors (GFDs), e.g., Haralick texture. HCS-neurons can 1) automatically extract all quantitative phenotype features in both NFD and GFDs, 2) identify statistically significant phenotypic changes upon drug treatments using ANOVA and regression analysis, and 3) generate an accurate classifier to group neurons treated by different drug concentrations using support vector machine and an intelligent feature selection method. To evaluate HCS-neurons, we treated P19 neurons with nocodazole (a microtubule depolymerizing drug which has been shown to impair neurite development) at six concentrations ranging from 0 to 1000 ng/mL. The experimental results show that all the 13 features of NFD have statistically significant difference with respect to changes in various levels of nocodazole drug concentrations (NDC) and the phenotypic changes of neurites were consistent to the known effect of nocodazole in promoting neurite retraction. Three identified features, total neurite length, average neurite length, and average neurite area were able to achieve an independent test accuracy of 90.28% for the six-dosage classification problem. This NFD module and neuron image datasets are provided as a freely downloadable MatLab project at http://iclab.life.nctu.edu.tw/HCS-Neurons.
Few automatic methods focus on analyzing multi-neuron images collected from HCS used in drug discovery. We provided an automatic HCS-based method for generating accurate classifiers to classify neurons based on their phenotypic changes upon drug treatments. The proposed HCS-neurons method is helpful in identifying and classifying chemical or biological molecules that alter the morphology of a group of neurons in HCS.
Algorithm evaluation provides a means to characterize variability across image analysis algorithms, validate algorithms by comparison with human annotations, combine results from multiple algorithms for performance improvement, and facilitate algorithm sensitivity studies. The sizes of images and image analysis results in pathology image analysis pose significant challenges in algorithm evaluation. We present an efficient parallel spatial database approach to model, normalize, manage, and query large volumes of analytical image result data. This provides an efficient platform for algorithm evaluation. Our experiments with a set of brain tumor images demonstrate the application, scalability, and effectiveness of the platform.
The paper describes an approach and platform for evaluation of pathology image analysis algorithms. The platform facilitates algorithm evaluation through a high-performance database built on the Pathology Analytic Imaging Standards (PAIS) data model.
(1) Develop a framework to support algorithm evaluation by modeling and managing analytical results and human annotations from pathology images; (2) Create a robust data normalization tool for converting, validating, and fixing spatial data from algorithm or human annotations; (3) Develop a set of queries to support data sampling and result comparisons; (4) Achieve high performance computation capacity via a parallel data management infrastructure, parallel data loading and spatial indexing optimizations in this infrastructure.
Materials and Methods:
We have considered two scenarios for algorithm evaluation: (1) algorithm comparison where multiple result sets from different methods are compared and consolidated; and (2) algorithm validation where algorithm results are compared with human annotations. We have developed a spatial normalization toolkit to validate and normalize spatial boundaries produced by image analysis algorithms or human annotations. The validated data were formatted based on the PAIS data model and loaded into a spatial database. To support efficient data loading, we have implemented a parallel data loading tool that takes advantage of multi-core CPUs to accelerate data injection. The spatial database manages both geometric shapes and image features or classifications, and enables spatial sampling, result comparison, and result aggregation through expressive structured query language (SQL) queries with spatial extensions. To provide scalable and efficient query support, we have employed a shared nothing parallel database architecture, which distributes data homogenously across multiple database partitions to take advantage of parallel computation power and implements spatial indexing to achieve high I/O throughput.
Our work proposes a high performance, parallel spatial database platform for algorithm validation and comparison. This platform was evaluated by storing, managing, and comparing analysis results from a set of brain tumor whole slide images. The tools we develop are open source and available to download.
Pathology image algorithm validation and comparison are essential to iterative algorithm development and refinement. One critical component is the support for queries involving spatial predicates and comparisons. In our work, we develop an efficient data model and parallel database approach to model, normalize, manage and query large volumes of analytical image result data. Our experiments demonstrate that the data partitioning strategy and the grid-based indexing result in good data distribution across database nodes and reduce I/O overhead in spatial join queries through parallel retrieval of relevant data and quick subsetting of datasets. The set of tools in the framework provide a full pipeline to normalize, load, manage and query analytical results for algorithm evaluation.
Algorithm validation; parallel database; pathology imaging; spatial database
Common object request broker architecture (CORBA) is a method for invoking distributed objects across a network. There has been some activity in applying this software technology to Digital Imaging and Communications in Medicine (DICOM), but no documented demonstration of how this would actually work. We report a CORBA demonstration that is functionally equivalent and in some ways superior to the DICOM communication protocol. In addition, in and outside of medicine, there is great interest in the use of extensible markup language (XML) to provide interoperation between databases. An example implementation of the DICOM data structure in XML will also be demonstrated. Using Visibroker ORB from Inprise (Scotts Valley, CA), a test bed was developed to simulate the principle DICOM operations: store, query, and retrieve (SQR). SQR is the most common interaction between a modality device application entity (AE) such as a computed tomography (CT) scanner, and a storage component, as well as between a storage component and a workstation. The storage of a CT study by invoking one of several storage objects residing on a network was simulated and demonstrated. In addition, XML database descriptors were used to facilitate the transfer of DICOM header information between independent databases. CORBA is demonstrated to have great potential for the next version of DICOM. It can provide redundant protection against single points of failure. XML appears to be an excellent method of providing interaction between separate databases managing the DICOM information object model, and may therefore eliminate the common use of proprietary client-server databases in commercial implementations of picture archiving and communication systems (PACS).
The interactions of legumes with symbiotic nitrogen-fixing bacteria cause the formation of specialized lateral root organs called root nodules. It has been postulated that this root nodule symbiosis system has recruited factors that act in early signaling pathways (common SYM genes) partly from the ancestral mycorrhizal symbiosis. However, the origins of factors needed for root nodule organogenesis are largely unknown. NODULE INCEPTION (NIN) is a nodulation-specific gene that encodes a putative transcription factor and acts downstream of the common SYM genes. Here, we identified two Nuclear Factor-Y (NF-Y) subunit genes, LjNF-YA1 and LjNF-YB1, as transcriptional targets of NIN in Lotus japonicus. These genes are expressed in root nodule primordia and their translational products interact in plant cells, indicating that they form an NF-Y complex in root nodule primordia. The knockdown of LjNF-YA1 inhibited root nodule organogenesis, as did the loss of function of NIN. Furthermore, we found that NIN overexpression induced root nodule primordium-like structures that originated from cortical cells in the absence of bacterial symbionts. Thus, NIN is a crucial factor responsible for initiating nodulation-specific symbiotic processes. In addition, ectopic expression of either NIN or the NF-Y subunit genes caused abnormal cell division during lateral root development. This indicated that the Lotus NF-Y subunits can function to stimulate cell division. Thus, transcriptional regulation by NIN, including the activation of the NF-Y subunit genes, induces cortical cell division, which is an initial step in root nodule organogenesis. Unlike the legume-specific NIN protein, NF-Y is a major CCAAT box binding protein complex that is widespread among eukaryotes. We propose that the evolution of root nodules in legume plants was associated with changes in the function of NIN. NIN has acquired functions that allow it to divert pathways involved in the regulation of cell division to root nodule organogenesis.
Legumes produce nodules in roots as the endosymbiotic organs for nitrogen-fixing bacteria, collectively called rhizobia. The symbiotic relationship enables legumes to survive on soil with poor nitrogen sources. The rhizobial infection triggers cell division in the cortex to generate root nodule primordia. The root nodule symbiosis has been thought to be recruited factors for the early signaling pathway from the ancestral mycorrhizal symbiosis, which usually does not accompany the root nodule formation. However, how the root nodule symbiosis-specific pathway inputs nodulation signals to molecular networks, by which cortical cell division is initiated, has not yet been elucidated. We found that NIN, a nodulation specific factor, induced cortical cell division without the rhizobial infection. NIN acted as a transcriptional activator and targeted two genes that encode different subunits of a NF-Y CCAAT box binding protein complex, LjNF-YA1 and LjNF-YB1. Inhibition of the LjNF-YA1 function prevented root nodule formation. Ectopic expression of the NF-Y subunit genes enhanced cell division in lateral root primordia that is not related to root nodule organogenesis. The NF-Y genes are thought to regulate cell division downstream of NIN. NF-Y is a general factor widespread in eukaryotes. We propose that NIN is a mediator between nodulation-specific signals and general regulatory mechanisms associated with cell proliferation.
The ACLAME database (http://aclame.ulb.ac.be) is a collection and classification of prokaryotic mobile genetic elements (MGEs) from various sources, comprising all known phage genomes, plasmids and transposons. In addition to providing information on the full genomes and genetic entities, it aims to build a comprehensive classification of the functional modules of MGEs at the protein, gene and higher levels. This first version contains a comprehensive classification of 5069 proteins from 119 DNA bacteriophages into over 400 functional families. This classification was produced automatically using TRIBE-MCL, a graph-theory-based Markov clustering algorithm that uses sequence measures as input, and then manually curated. Manual curation was aided by consulting annotations available in public databases retrieved through additional sequence similarity searches using Psi-Blast and Hidden Markov Models. The database is publicly accessible and open to expert volunteers willing to participate in its curation. Its web interface allows browsing as well as querying the classification. The main objectives are to collect and organize in a rational way the complexity inherent to MGEs, to extend and improve the inadequate annotation currently associated with MGEs and to screen known genomes for the validation and discovery of new MGEs.
The last decade witnessed a growing interest in research on content-based image retrieval (CBIR) and related areas. Several systems for managing and retrieving images have been proposed, each one tailored to a specific application. Functionalities commonly available in CBIR systems include: storage and management of complex data, development of feature extractors to support similarity queries, development of index structures to speed up image retrieval, and design and implementation of an intuitive graphical user interface tailored to each application. To facilitate the development of new CBIR systems, we propose an image-handling extension to the relational database management system (RDBMS) PostgreSQL. This extension, called PostgreSQL-IE, is independent of the application and provides the advantage of being open source and portable. The proposed system extends the functionalities of the structured query language SQL with new functions that are able to create new feature extraction procedures, new feature vectors as combinations of previously defined features, and new access methods, as well as to compose similarity queries. PostgreSQL-IE makes available a new image data type, which permits the association of various images with a given unique image attribute. This resource makes it possible to combine visual features of different images in the same feature vector. To validate the concepts and resources available in the proposed extended RDBMS, we propose a CBIR system applied to the analysis of mammograms using PostgreSQL-IE.
Database management systems; digital mammography; digital image management; image database; image retrieval; information storage and retrieval; information system
Statistical approach is a valuable way to describe texture primitives. The aim of this study is to design and implement a classifier framework to automatically identify the thyroid nodules from ultrasound images. Using rigorous mathematical foundations, this article focuses on developing a discriminative texture analysis method based on texture variations corresponding to four biological areas (normal thyroid, thyroid nodule, subcutaneous tissues, and trachea). Our research follows three steps: automatic extraction of the most discriminative first-order statistical texture features, building a classifier that automatically optimizes and selects the valuable features, and correlating significant texture parameters with the four biological areas of interest based on pixel classification and location characteristics. Twenty ultrasound images of normal thyroid and 20 that present thyroid nodules were used. The analysis involves both the whole thyroid ultrasound images and the region of interests (ROIs). The proposed system and the classification results are validated using the receiver operating characteristics which give a better overall view of the classification performance of methods. It is found that the proposed approach is capable of identifying thyroid nodules with a correct classification rate of 83 % when whole image is analyzed and with a percent of 91 % when the ROIs are analyzed.
Thyroid ultrasound images; First-order statistical features; T test; CADi software; Pixel classification