Neuroblastoma (NB) is a cancer of nerve cell origin and it commonly affects infants and children. Based on the American Cancer Society’s statistics, it is by far the most common cancer in infants and the third most common type of cancer in children [1
]. Every year approximately 650 patients are diagnosed with this disease in the United States. As in most cancer types, histopathological examinations are required to characterize the histology of the tumor for further treatment planning. The World Health Organization recommends the use of the International Neuroblastoma Pathology Classification (the Shimada system) for categorization of the patients into different prognostic groups [2
]. It is an age-linked classification system based on morphological characteristics of the tissue. shows a relevant summary of this classification system as a tree diagram, where the grade of neuroblastic differentiation, the degree of Schwannian stromal development, and the mitosis and karyorrhexis index are the most salient features that contribute to the final tissue classification as favorable and unfavorable histology.
A simplified diagram of the International Neuroblastoma Pathology Classification (the Shimada system), where UH and FH correspond to unfavorable and favorable histology, respectively.
The histopathological examination guides the oncologists in making decisions on timing and therapy; hence the accuracy of the classification is important to prevent making any under or over treatment. Unfortunately, the qualitative visual examination performed by pathologists under the microscope is tedious and prone to error due to several factors. First, it is not practical to examine every region of the tissue slide under the microscope at high magnifications (e.g., 40×). For NB diagnosis, pathologists typically pick some representative regions at lower magnifications (e.g., 2×, 4×) and examine only those regions. The final decision about the entire slide is based on these sampled regions. Although this approach provides accurate decisions, it may be misleading for heterogeneous tumors. Second, the resulting diagnosis varies considerably between different readers. Experience and fatigue may cause significant inter-and intra-reader variations among pathologists. A recent study by Teot et al. shows that for NB diagnosis, this variation can be up to 20% between central and institutional reviewers [4
]. There is no specific study that relates to such variations in classification of the degree of Schwannian stromal development in NB diagnosis. However, as shown in , this analysis is the first, hence, a critical step in NB prognosis.
To address these drawbacks, we are developing a computer-aided diagnosis (CAD) system for NB. The use of computers to assist physicians in their evaluations of medical images is not a new study. There are several commercially available CAD systems that have been proven to improve the clinical diagnosis of radiology images for several modalities such as mamagraphy and computed tomography (CT) [5
]. However, research on the development of such similar systems for whole-slide histopathological image analysis is relatively new. This is mostly due to the challenges in both the acquisition and the processing of histopathological images that are much larger in size, as opposed to radiology images. Parallel to the recent developments in whole-slide digital scanners, research studies on quantitative histopathological image analysis have been accelerated. Providing computational tools to extract measurable features, histopathological image analysis systems help extracting more objective and more accurate diagnostic clues that might not be easily observed by qualitative analysis performed by pathologists.
Research efforts on histopathological image analysis can be categorized into two main groups based on their objectives: 1) Content-based image retrieval (CBIR) and 2) Computer-aided diagnosis systems. CBIR systems aim to retrieve clinical cases from a database, containing previously diagnosed representative cases similar to a query image; hence pathologists could make use of the established knowledge in their decision making process [7
]. On the other hand, CAD systems directly focus on making the diagnostic classifications of the tissues (e.g., malignant or benign; grade of differentiation) [9
]. However, due to variations of tissue structures and different prognosis procedures in different disease types, it is impractical to develop a universal system that would work for all disease types, even if they show similar characteristics. Recently, Petushi et al. proposed an image analysis system to determine the grade of differentiation for breast cancer [10
]. Their study provides automatic segmentation and classification of distinct cell nuclei that are used for identification of the grade of differentiation to indicate the degree of malignancy. Similar studies have been conducted to automate the Gleason grading system for prostate cancer. Khouzani et al. proposed an image analysis approach using a multiwavelet based approach to characterize the texture of the samples associated with different grades [11
]. In addition to the textural features, Doyle et al. introduced the use of architectural features based on the spatial organization of cells, as well as their morphologies [12
]. Tabesh et al. incorporated color, texture, and morphometric image features at the global and histological object levels for prostate tissue grading [13
Similarly, several research studies have been conducted on quantitative analysis of NB. Gurcan et al. proposed a cell segmentation method from H&E stained pathology slides using morphological reconstruction followed by hysteresis thresholding [14
]. Most recently, Kong et al. proposed a classification approach using texture and color information to determine the grade of differentiation for NB diagnosis [15
]. Both studies showed promising results for developing an automated framework to be used in clinical practice as assistance. However, to the best of our knowledge, there has not been any research conducted to analyze Schwannian stromal development for NB diagnosis.
In this study, our goal is to develop a CAD system to determine the degree of Schwannian stromal development as either stroma-rich or stroma-poor from digitized whole-slide NB slides. In we show sample NB tissue images cropped at 40× magnification. correspond to stroma-rich tissue samples that can be characterized by an extensive growth of Schwannian and other supporting elements. On the contrary, demonstrate stroma-poor tissue samples that can be characterized by a diffuse growth of neuroblastic cells with various degrees of differentiation randomly distributed by thin septa of fibrovascular tissue and neurphil meshwork.
Example images of (a,b) stroma-rich and (c,d) stroma-poor tissue.
Using sophisticated computer vision and pattern recognition techniques, we introduced a multi-resolution image analysis approach to identify image regions associated with different histopathological components (i.e., stroma-rich and stroma-poor). The proposed multi-resolution approach mimics the way pathologists examine the tissue slides under the microscope such that the image analysis starts from the lowest resolution, which corresponds to the lower magnification levels in a microscope and uses the higher resolution representations for the regions where the decision for the classification requires more detailed information. We proposed a texture based approach to differentiate Schwannian stroma-rich tissue from other cytological structures. We employed the rotation invariant co-occurrence statistics and local binary patterns to characterize the stroma septa with different organizations. Using representative samples, we constructed a training dataset and extracted textural features. We further employed an automated feature selection step in which the most discriminating subset of the features are determined at each resolution level to improve the classification performance. Finally, the classification has been performed using a statistical classifier.
1.1 Image Dataset
In our study, we used 45 whole-slide tissue samples collected from Nationwide Children’s Hospital. Tissue slides were obtained retrospectively according to an Institutional Review Board (IRB) protocol. Each slide was embedded in paraffin and was cut at a thickness of 5 μm according to commonly used Children’s Oncology Group protocols. After being stained by hematoxylin and eosin (H&E), each tissue slice was fixed on a slide and was digitized using a ScanScope T2 digitizer (Aperio, San Diego, CA) at 40× magnification. The digitized whole-slide tissue images typically have a spatial resolution up to 100k × 120k with a disk size up to 40 GB. Therefore, at the time of processing, the whole-slide images are decomposed into smaller non-overlapping image tiles. The tiling of the whole-slide images not only made it practical to process the whole-slide images, but also allowed leveraging the parallelism in processing each image tile independently. Experimentally, we determined the image tile size as 896 × 896 in pixels. The average resolution of tissue slides used in our study is approximately 71, 623 × 100, 348 in pixels; with an average disk size of 20±8 GB; hence the average number of image tiles in a whole-slide image is approximately 8,900, which still requires significant computation time.
One representative whole-slide sample for each subtype (i.e., stroma-rich and stroma-poor) have been used to generate the training image tiles and the remaining 43 were used for whole-slide independent testing. 32 of 43 whole-slide samples are associated with stroma-poor and the rest are associated with stroma-rich subtypes, as determined by an expert pathologist. Five of stroma-rich cases correspond to Ganglioneuroblastoma and six to Ganglioneuroma. The remaining 32 stroma-poor cases correspond to neuroblastoma. Their morphological characteristics in terms of mitotic-karyorrhectic index (MKI) and the grade of differentiation, according to the International Neuroblastoma Pathology Classification (the Shimada system), are summarized in .
Distribution of Neuroblasoma cases based on MKI and the grade of differentiation. UD, PD, and D correspond to undifferentiated, poorly-differentiated and differentiating subtypes, respectively.
1.2 Computational Infrastructure
The image analysis routines were implemented using MATLAB (version 184.108.40.206, Natick, MA) and the experimental studies have been carried out on a 64-node PC cluster owned by the Department of Biomedical Informatics, The Ohio State University. Each computation node on the cluster is equipped with dual 2.4 GHz Opteron 250 processors and 8 GB of RAM. This parallel system uses a software developed in house, which distributes the processing of image tiles to each node, applies the required MATLAB routines on each image tile locally, collects the classification outputs and stitches them together to create the final classification map. shows the computational infrastructure used to analyze the whole-slide images in parallel.
Computational infrastructure used to process the whole-slide images in parallel.
An additional extension of this software is a grid-based infrastructure where image analysis modules (i.e., MATLAB or C/C++ files) could be uploaded to the system by multiple developers and pipelined to be applied to the available whole-slide images stored in the common repository using a grid interface [17