Tumor histology provides a detailed insight into cellular morphology, organization, and heterogeneity. For example, tumor histological sections can be used to identify mitotic cells, cellular aneuploidy, and autoimmune responses. More importantly, if tumor morphology and architecture can be quantified on large histological datasets, then it will pave the way for constructing histological databases that are prognostic, the same way that genome analysis techniques have identified molecular subtypes and predictive markers. Genome wide analysis techniques (e.g., microarray analysis) have the advantages of standardized tools for data analysis and pathway enrichment, which enable hypothesis generation for the underlying mechanism. On the other hand, histological signatures are hard to compute because of the biological and technical variations in the stained histological sections; however, they offer insights into tissue composition as well as heterogeneity (e.g., mixed populations) and rare events.
Histological sections are often stained with hematoxylin and eosin stains (H&E), which label DNA and protein contents, respectively. Traditional histological analysis is performed by a trained pathologist through the characterization of phenotypic content, such as various cell types, cellular organization, cell state and health, and cellular secretion. However, such manual analysis may incur inter- and intraobserver variations [1
]. On the other hand, the value of the quantitative histological image analysis originates from its capability in capturing detailed morphometric features on a cell-by-cell basis and the organization of cells. Such rich description can be linked with genomic information and survival distribution as an improved basis for diagnosis and therapy. Additionally, in the presence of large datasets, quantitative histological signatures can be used to identify intrinsic subtypes of a specific tumor type, which is supplementary to histological tumor grading.
One of the main technical barriers for processing a large collection of histological data is that the color composition is subject to technical (e.g., fixation, staining) and biological (e.g., cell type, cell state) variations across histological tissue sections, especially when these tissue sections were processed and scanned at different laboratories. Here, a histological tissue section refers to an image of a thin slice of tissue applied to a microscopic slide and scanned from a light microscope. From an image analysis perspective, color variations can occur both within and across tissue sections. For example, within a tissue section, some nuclei may have low chromatin content (e.g., light blue signals), while others may have higher signals (e.g., dark blue); nuclear intensity in one tissue section may be very close to the background intensity (e.g., cytoplasmic, macromolecular components) in another tissue section.
Our approach evolved from our insights and experiments indicating that simple color decomposition and thresholding techniques miss or overestimates some of the nuclei in the image, i.e., nuclei with low chromatin contents are excluded. The problem is further complicated as a result of the diversity in nuclear size and shape (e.g., the classic scale problem). It became clear that the incorporation of prior knowledge (e.g., manual annotation and validation by the pathologist) would be needed not only for validation, but also for constructing a model that captures wide variations in the nuclear staining, both within and across tissue sections. Accordingly, our proposed approach integrates prior knowledge, which is characterized by the Gaussian mixture models (GMM), and the nuclear staining information of the original, which is extracted by color decomposition, within a level set framework. The net result is a binarized image of blobs (a single nucleus or a clump of nuclei), which are either validated or partitioned further through geometric reasoning.
Organization of the rest of this paper is as follows. Section II reviews previous research in this area with a focus on both how 1) quantitative representation of H&E sections can be leveraged for translational medicine, and 2) nuclear segmentation is performed to address clinical issues; Section III describes the details of our approach; Section IV provides experimental and validation results; and Section V concludes this paper.