Brain magnetic resonance images (MRIs) tissue segmentation is one of the most important parts of the clinical diagnostic tools. Pixel classification methods have been frequently used in the image segmentation with two supervised and unsupervised approaches up to now. Supervised segmentation methods lead to high accuracy, but they need a large amount of labeled data, which is hard, expensive, and slow to obtain. Moreover, they cannot use unlabeled data to train classifiers. On the other hand, unsupervised segmentation methods have no prior knowledge and lead to low level of performance. However, semi-supervised learning which uses a few labeled data together with a large amount of unlabeled data causes higher accuracy with less trouble. In this paper, we propose an ensemble semi-supervised frame-work for segmenting of brain magnetic resonance imaging (MRI) tissues that it has been used results of several semi-supervised classifiers simultaneously. Selecting appropriate classifiers has a significant role in the performance of this frame-work. Hence, in this paper, we present two semi-supervised algorithms expectation filtering maximization and MCo_Training that are improved versions of semi-supervised methods expectation maximization and Co_Training and increase segmentation accuracy. Afterward, we use these improved classifiers together with graph-based semi-supervised classifier as components of the ensemble frame-work. Experimental results show that performance of segmentation in this approach is higher than both supervised methods and the individual semi-supervised classifiers.
Brain magnetic resonance image tissue segmentation; ensemble semi-supervised frame-work; expectation filtering maximization classifier; MCo_Training classifier
Binarization is often recognized to be one of the most important steps in most high-level image analysis systems, particularly for object recognition. Its precise functioning highly determines the performance of the entire system. According to many researchers, segmentation finishes when the observer’s goal is satisfied. Experience has shown that the most effective methods continue to be the iterative ones. However, a problem with these algorithms is the stopping criterion. In this work, entropy is used as the stopping criterion when segmenting an image by recursively applying mean shift filtering. Of this way, a new algorithm is introduced for the binarization of medical images, where the binarization is carried out after the segmented image was obtained. The good performance of the proposed method; that is, the good quality of the binarization, is illustrated with several experimental results. In this paper a comparison was carried out among the obtained results with this new algorithm with respect to another developed by the author and collaborators previously and also with Otsu’s method.
image segmentation; mean shift; algorithm; entropy; Otsu’s method
Breast lesion segmentation in magnetic resonance (MR) images is one of the most important parts of clinical diagnostic tools. Pixel classification methods have been frequently used in image segmentation with two supervised and unsupervised approaches up to now. Supervised segmentation methods lead to high accuracy, but they need a large amount of labeled data, which is hard, expensive, and slow to be obtained. On the other hand, unsupervised segmentation methods need no prior knowledge and lead to low performance. However, semi-supervised learning which uses not only a few labeled data, but also a large amount of unlabeled data promises higher accuracy with less effort. In this paper, we propose a new interactive semi-supervised approach to segmentation of suspicious lesions in breast MRI. Using a suitable classifier in this approach has an important role in its performance; in this paper, we present a semi-supervised algorithm improved self-training (IMPST) which is an improved version of self-training method and increase segmentation accuracy. Experimental results show that performance of segmentation in this approach is higher than supervised and unsupervised methods such as K nearest neighbors, Bayesian, Support Vector Machine, and Fuzzy c-Means.
Breast lesions segmentation; magnetic resonance imaging; self-training; semi-supervised learning
The authors propose a CT image segmentation method using structural analysis that is useful for objects with structural dynamic characteristics. Motivation of our research is from the area of genetic activity. In order to reveal the roles of genes, it is necessary to create mutant mice and measure differences among them by scanning their skeletons with an X-ray CT scanner. The CT image needs to be manually segmented into pieces of the bones. It is a very time consuming to manually segment many mutant mouse models in order to reveal the roles of genes. It is desirable to make this segmentation procedure automatic. Although numerous papers in the past have proposed segmentation techniques, no general segmentation method for skeletons of living creatures has been established. Against this background, the authors propose a segmentation method based on the concept of destruction analogy. To realize this concept, structural analysis is performed using the finite element method (FEM), as structurally weak areas can be expected to break under conditions of stress. The contribution of the method is its novelty, as no studies have so far used structural analysis for image segmentation. The method's implementation involves three steps. First, finite elements are created directly from the pixels of a CT image, and then candidates are also selected in areas where segmentation is thought to be appropriate. The second step involves destruction analogy to find a single candidate with high strain chosen as the segmentation target. The boundary conditions for FEM are also set automatically. Then, destruction analogy is implemented by replacing pixels with high strain as background ones, and this process is iterated until object is decomposed into two parts. Here, CT image segmentation is demonstrated using various types of CT imagery.
Quantitative microscopy has been extensively used in biomedical research and has provided significant insights into structure and dynamics at the cell and tissue level. The entire procedure of quantitative microscopy is comprised of specimen preparation, light absorption/reflection/emission from the specimen, microscope optical processing, optical/electrical conversion by a camera or detector, and computational processing of digitized images. Although many of the latest digital signal processing techniques have been successfully applied to compress, restore, and register digital microscope images, automated approaches for recognition and understanding of complex subcellular patterns in light microscope images have been far less widely used. In this review, we describe a systematic approach for interpreting protein subcellular distributions using various sets of Subcellular Location Features (SLF) in combination with supervised classification and unsupervised clustering methods. These methods can handle complex patterns in digital microscope images and the features can be applied for other purposes such as objectively choosing a representative image from a collection and performing statistical comparison of image sets.
fluorescence microscopy; subcellular location features; pattern recognition; protein distribution comparison; location proteomics; protein localization
The accurate measurement of cell and nuclei contours are critical for the sensitive and specific detection of changes in normal cells in several medical informatics disciplines. Within microscopy, this task is facilitated using fluorescence cell stains, and segmentation is often the first step in such approaches. Due to the complex nature of cell issues and problems inherent to microscopy, unsupervised mining approaches of clustering can be incorporated in the segmentation of cells. In this study, we have developed and evaluated the performance of multiple unsupervised data mining techniques in cell image segmentation. We adapt four distinctive, yet complementary, methods for unsupervised learning, including those based on k-means clustering, EM, Otsu’s threshold, and GMAC. Validation measures are defined, and the performance of the techniques is evaluated both quantitatively and qualitatively using synthetic and recently published real data. Experimental results demonstrate that k-means, Otsu’s threshold, and GMAC perform similarly, and have more precise segmentation results than EM. We report that EM has higher recall values and lower precision results from under-segmentation due to its Gaussian model assumption. We also demonstrate that these methods need spatial information to segment complex real cell images with a high degree of efficacy, as expected in many medical informatics applications.
Fluorescence microscope cell image; segmentation; K-means clustering; EM; threshold; GMAC.
Template-based segmentation techniques have been developed to facilitate the accurate targeting of deep brain structures in patients with movement disorders. Three template-based brain MRI segmentation techniques were compared to determine the best strategy for segmenting the deep brain structures of patients with Parkinson’s disease.
T1-weighted and T2-weighted magnetic resonance (MR) image templates were created by averaging MR images of 57 patients with Parkinson’s disease. Twenty-four deep brain structures were manually segmented on the templates. To validate the template-based segmentation, 14 of the 24 deep brain structures from the templates were manually segmented on 10 MR scans of Parkinson’s patients as a gold standard. We compared the manual segmentations with three methods of automated segmentation: two registration-based approaches, Automatic Nonlinear Image Matching and Anatomical Labelling -(ANIMAL) and Symmetric Image Normalization - (SyN), and one patch-label fusion technique. The automated labels were then compared with the manual labels using a Dice-kappa metric and center of gravity. A Friedman test was used to compare the Dice-kappa values and paired t-tests for the center of gravity.
The Friedman test showed a significant difference between the three methods for both thalami (p < 0.05) and not for the subthalamic nuclei. Registration with ANIMAL was better than with SyN for the left thalamus, and was better than the patch-based method for the right thalamus.
Although template-based approaches are the most used techniques to segment basal ganglia by warping onto MR images, we found that the patch-based method provided similar results and was less-time consuming. Patch-based method may be preferable for the subthalamic nucleus segmentation in patients with Parkinson’s disease.
Basal ganglia; MRI template; Patch-based method; Parkinson’s disease; Segmentation
A biomedical entity mention in articles and other free texts is often ambiguous. For example, 13% of the gene names (aliases) might refer to more than one gene. The task of Gene Symbol Disambiguation (GSD) – a special case of Word Sense Disambiguation (WSD) – is to assign a unique gene identifier for all identified gene name aliases in biology-related articles. Supervised and unsupervised machine learning WSD techniques have been applied in the biomedical field with promising results. We examine here the utilisation potential of the fact – one of the special features of biological articles – that the authors of the documents are known through graph-based semi-supervised methods for the GSD task.
Our key hypothesis is that a biologist refers to each particular gene by a fixed gene alias and this holds for the co-authors as well. To make use of the co-authorship information we decided to build the inverse co-author graph on MedLine abstracts. The nodes of the inverse co-author graph are articles and there is an edge between two nodes if and only if the two articles have a mutual author. We introduce here two methods using distances (based on the graph) of abstracts for the GSD task. We found that a disambiguation decision can be made in 85% of cases with an extremely high (99.5%) precision rate just by using information obtained from the inverse co-author graph. We incorporated the co-authorship information into two GSD systems in order to attain full coverage and in experiments our procedure achieved precision of 94.3%, 98.85%, 96.05% and 99.63% on the human, mouse, fly and yeast GSD evaluation sets, respectively.
Based on the promising results obtained so far we suggest that the co-authorship information and the circumstances of the articles' release (like the title of the journal, the year of publication) can be a crucial building block of any sophisticated similarity measure among biological articles and hence the methods introduced here should be useful for other biomedical natural language processing tasks (like organism or target disease detection) as well.
The number of distinct foods consumed in a meal is of significant clinical concern in the study of obesity and other eating disorders. This paper proposes the use of information contained in chewing and swallowing sequences for meal segmentation by food types. Data collected from experiments of 17 volunteers were analyzed using two different clustering techniques. First, an unsupervised clustering technique, Affinity Propagation (AP), was used to automatically identify the number of segments within a meal. Second, performance of the unsupervised AP method was compared to a supervised learning approach based on Agglomerative Hierarchical Clustering (AHC). While the AP method was able to obtain 90% accuracy in predicting the number of food items, the AHC achieved an accuracy >95%. Experimental results suggest that the proposed models of automatic meal segmentation may be utilized as part of an integral application for objective Monitoring of Ingestive Behavior in free living conditions.
Monitoring of Ingestive Behavior; Food intake; Clustering; Affinity Propagation; Agglomerative Hierarchical Clustering
Feature selection is an important pre-processing task in the analysis of complex data. Selecting an appropriate subset of features can improve classification or clustering and lead to better understanding of the data. An important example is that of finding an informative group of genes out of thousands that appear in gene-expression analysis. Numerous supervised methods have been suggested but only a few unsupervised ones exist. Unsupervised Feature Filtering (UFF) is such a method, based on an entropy measure of Singular Value Decomposition (SVD), ranking features and selecting a group of preferred ones.
We analyze the statistical properties of UFF and present an efficient approximation for the calculation of its entropy measure. This allows us to develop a web-tool that implements the UFF algorithm. We propose novel criteria to indicate whether a considered dataset is amenable to feature selection by UFF. Relying on formalism similar to UFF we propose also an Unsupervised Detection of Outliers (UDO) method, providing a novel definition of outliers and producing a measure to rank the "outlier-degree" of an instance.
Our methods are demonstrated on gene and microRNA expression datasets, covering viral infection disease and cancer. We apply UFFizi to select genes from these datasets and discuss their biological and medical relevance.
Statistical properties extracted from the UFF algorithm can distinguish selected features from others. UFFizi is a framework that is based on the UFF algorithm and it is applicable for a wide range of diseases. The framework is also implemented as a web-tool.
The web-tool is available at: http://adios.tau.ac.il/UFFizi
Segmentation is a fundamental component of many medical image processing applications, and it has long been recognized as a challenging problem. In this paper, we report our research and development efforts on analyzing and extracting clinically meaningful regions from uterine cervix images in a large database created for the study of cervical cancer. In addition to proposing new algorithms, we also focus on developing open source tools which are in synchrony with the research objectives. These efforts have resulted in three Web-accessible tools which address three important and interrelated sub-topics in medical image segmentation, respectively: the BMT (Boundary Marking Tool), CST (Cervigram Segmentation Tool), and MOSES (Multi-Observer Segmentation Evaluation System). The BMT is for manual segmentation, typically to collect “ground truth” image regions from medical experts. The CST is for automatic segmentation, and MOSES is for segmentation evaluation. These tools are designed to be a unified set in which data can be conveniently exchanged. They have value not only for improving the reliability and accuracy of algorithms of uterine cervix image segmentation, but also promoting collaboration between biomedical experts and engineers which are crucial to medical image processing applications. Although the CST is designed for the unique characteristics of cervigrams, the BMT and MOSES are very general and extensible, and can be easily adapted to other biomedical image collections.
Anomaly detection involves identifying rare data instances (anomalies) that come from a different class or distribution than the majority (which are simply called “normal” instances). Given a training set of only normal data, the semi-supervised anomaly detection task is to identify anomalies in the future. Good solutions to this task have applications in fraud and intrusion detection. The unsupervised anomaly detection task is different: Given unlabeled, mostly-normal data, identify the anomalies among them. Many real-world machine learning tasks, including many fraud and intrusion detection tasks, are unsupervised because it is impractical (or impossible) to verify all of the training data. We recently presented FRaC, a new approach for semi-supervised anomaly detection. FRaC is based on using normal instances to build an ensemble of feature models, and then identifying instances that disagree with those models as anomalous. In this paper, we investigate the behavior of FRaC experimentally and explain why FRaC is so successful. We also show that FRaC is a superior approach for the unsupervised as well as the semi-supervised anomaly detection task, compared to well-known state-of-the-art anomaly detection methods, LOF and one-class support vector machines, and to an existing feature-modeling approach.
Anomaly detection; Unsupervised learning
The ability to automatically segment an image into distinct regions is a critical aspect in many visual processing applications. Because inaccuracies often exist in automatic segmentation, manual segmentation is necessary in some application domains to correct mistakes, such as required in the reconstruction of neuronal processes from microscopic images. The goal of the automated segmentation tool is traditionally to produce the highest-quality segmentation, where quality is measured by the similarity to actual ground truth, so as to minimize the volume of manual correction necessary. Manual correction is generally orders-of-magnitude more time consuming than automated segmentation, often making handling large images intractable. Therefore, we propose a more relevant goal: minimizing the turn-around time of automated/manual segmentation while attaining a level of similarity with ground truth. It is not always necessary to inspect every aspect of an image to generate a useful segmentation. As such, we propose a strategy to guide manual segmentation to the most uncertain parts of segmentation. Our contributions include 1) a probabilistic measure that evaluates segmentation without ground truth and 2) a methodology that leverages these probabilistic measures to significantly reduce manual correction while maintaining segmentation quality.
There are many instances in genetics in which we wish to determine whether two
candidate populations are distinguishable on the basis of their genetic
structure. Examples include populations which are geographically separated,
case–control studies and quality control (when participants in a study
have been genotyped at different laboratories). This latter application is of
particular importance in the era of large scale genome wide association studies,
when collections of individuals genotyped at different locations are being
merged to provide increased power. The traditional method for detecting
structure within a population is some form of exploratory technique such as
principal components analysis. Such methods, which do not utilise our prior
knowledge of the membership of the candidate populations. are termed
unsupervised. Supervised methods, on the other hand are
able to utilise this prior knowledge when it is available.
In this paper we demonstrate that in such cases modern supervised approaches are
a more appropriate tool for detecting genetic differences between populations.
We apply two such methods, (neural networks and support vector machines) to the
classification of three populations (two from Scotland and one from Bulgaria).
The sensitivity exhibited by both these methods is considerably higher than that
attained by principal components analysis and in fact comfortably exceeds a
recently conjectured theoretical limit on the sensitivity of unsupervised
methods. In particular, our methods can distinguish between the two Scottish
populations, where principal components analysis cannot. We suggest, on the
basis of our results that a supervised learning approach should be the method of
choice when classifying individuals into pre-defined populations, particularly
in quality control for large scale genome wide association studies.
Various statistical scores have been proposed for evaluating the significance of genes that may exhibit differential expression between two or more controlled conditions. However, in many clinical studies to detect clinical marker genes for example, the conditions have not necessarily been controlled well, thus condition labels are sometimes hard to obtain due to physical, financial, and time costs. In such a situation, we can consider an unsupervised case where labels are not available or a semi-supervised case where labels are available for a part of the whole sample set, rather than a well-studied supervised case where all samples have their labels.
We assume a latent variable model for the expression of active genes and apply the optimal discovery procedure (ODP) proposed by Storey (2005) to the model. Our latent variable model allows gene significance scores to be applied to unsupervised and semi-supervised cases. The ODP framework improves detectability by sharing the estimated parameters of null and alternative models of multiple tests over multiple genes. A theoretical consideration leads to two different interpretations of the latent variable, i.e., it only implicitly affects the alternative model through the model parameters, or it is explicitly included in the alternative model, so that the interpretations correspond to two different implementations of ODP. By comparing the two implementations through experiments with simulation data, we have found that sharing the latent variable estimation is effective for increasing the detectability of truly active genes. We also show that the unsupervised and semi-supervised rating of genes, which takes into account the samples without condition labels, can improve detection of active genes in real gene discovery problems.
The experimental results indicate that the ODP framework is effective for hypotheses including latent variables and is further improved by sharing the estimations of hidden variables over multiple tests.
Interpreting metaphor is a hard but important problem in natural language processing that has numerous applications. One way to address this task is by finding a paraphrase that can replace the metaphorically used word in a given context. This approach has been previously implemented only within supervised frameworks, relying on manually constructed lexical resources, such as WordNet. In contrast, we present a fully unsupervised metaphor interpretation method that extracts literal paraphrases for metaphorical expressions from the Web. It achieves a precision of , which is high for an unsupervised paraphrasing approach. Moreover, the method significantly outperforms both the baseline and the selectional preference-based method of Shutova employed in an unsupervised setting.
In this study, we present pituitary adenoma volumetry using the free and open source medical image computing platform for biomedical research: (3D) Slicer. Volumetric changes in cerebral pathologies like pituitary adenomas are a critical factor in treatment decisions by physicians and in general the volume is acquired manually. Therefore, manual slice-by-slice segmentations in magnetic resonance imaging (MRI) data, which have been obtained at regular intervals, are performed. In contrast to this manual time consuming slice-by-slice segmentation process Slicer is an alternative which can be significantly faster and less user intensive. In this contribution, we compare pure manual segmentations of ten pituitary adenomas with semi-automatic segmentations under Slicer. Thus, physicians drew the boundaries completely manually on a slice-by-slice basis and performed a Slicer-enhanced segmentation using the competitive region-growing based module of Slicer named GrowCut. Results showed that the time and user effort required for GrowCut-based segmentations were on average about thirty percent less than the pure manual segmentations. Furthermore, we calculated the Dice Similarity Coefficient (DSC) between the manual and the Slicer-based segmentations to proof that the two are comparable yielding an average DSC of 81.97±3.39%.
Image segmentation is important with applications to several problems in biology and medicine. While extensively researched, generally, current segmentation methods perform adequately in the applications for which they were designed, but often require extensive modifications or calibrations before being used in a different application. We describe an approach that, with few modifications, can be used in a variety of image segmentation problems. The approach is based on a supervised learning strategy that utilizes intensity neighborhoods to assign each pixel in a test image its correct class based on training data. We describe methods for modeling rotations and variations in scales as well as a subset selection for training the classifiers. We show that the performance of our approach in tissue segmentation tasks in magnetic resonance and histopathology microscopy images, as well as nuclei segmentation from fluorescence microscopy images, is similar to or better than several algorithms specifically designed for each of these applications.
The simultaneous segmentation of multiple objects is an important problem in many imaging and computer vision applications. Various extensions of level set segmentation techniques to multiple objects have been proposed; however, no one method maintains object relationships, preserves topology, is computationally efficient, and provides an object-dependent internal and external force capability. In this paper, a framework for segmenting multiple objects that permits different forces to be applied to different boundaries while maintaining object topology and relationships is presented. Because of this framework, the segmentation of multiple objects each with multiple compartments is supported, and no overlaps or vacuums are generated. The computational complexity of this approach is independent of the number of objects to segment, thereby permitting the simultaneous segmentation of a large number of components. The properties of this approach and comparisons to existing methods are shown using a variety of images, both synthetic and real.
To develop and validate a multi-dimensional segmentation and filtering methodology for accurate blood flow velocity field reconstruction from phase contrast magnetic resonance imaging (PC MRI).
Materials and Methods
The proposed technique consists of two steps: 1) the boundary of the vessel is automatically segmented using the active contour approach; 2) noise embedded within the segmented vector field is selectively removed using a novel fuzzy adaptive vector median filtering (FAVMF) technique. This two-step segmentation process is tested and validated on 111 synthetically generated PC MRI slices and on 10 patients with congenital heart disease.
The active contour technique was effective for segmenting blood vessels having a sensitivity and specificity of 93.1 and 92.1 % using manual segmentation as a reference standard. FAVMF was the superior technique in filtering out noise vectors, when compared to other commonly used filters in PC MRI (p <0.05). The peak wall shear rate calculated from the PC MRI data (248 ± 39 s−1), was significantly decreased to (146 ± 26 s−1) after the filtering process.
The proposed two step segmentation and filtering methodology is more accurate compared to a single step segmentation process for post processing of PC MRI data.
PC MRI; Noise Filtering; Fuzzy Systems; Vector Median Filtering; Segmentation; Active Contours
Current medical imaging systems provide excellent spatial resolution, high tissue contrast, and up to 65535 intensity levels. Thus, image processing techniques which aim to exploit the information contained in the images are necessary for using these images in computer-aided diagnosis (CAD) systems. Image segmentation may be defined as the process of parcelling the image to delimit different neuroanatomical tissues present on the brain. In this paper we propose a segmentation technique using 3D statistical features extracted from the volume image. In addition, the presented method is based on unsupervised vector quantization and fuzzy clustering techniques and does not use any a priori information. The resulting fuzzy segmentation method addresses the problem of partial volume effect (PVE) and has been assessed using real brain images from the Internet Brain Image Repository (IBSR).
Multi-atlas segmentation has been widely applied in medical image analysis. With deformable registration, this technique realizes label transfer from pre-labeled atlases to unknown images. When deformable registration produces error, label fusion that combines results produced by multiple atlases is an effective way for reducing segmentation errors. Among the existing label fusion strategies, similarity-weighted voting strategies with spatially varying weight distributions have been particularly successful. We show that, weighted voting based label fusion produces a spatial bias that under-segments structures with convex shapes. The bias can be approximated as applying spatial convolution to the ground truth spatial label probability maps, where the convolution kernel combines the distribution of residual registration errors and the function producing similarity-based voting weights. To reduce this bias, we apply a standard spatial deconvolution to the spatial probability maps obtained from weighted voting. In a brain image segmentation experiment, we demonstrate the spatial bias and show that our technique substantially reduces this spatial bias.
Statistical models have shown considerable promise as a basis for segmenting and interpreting cardiac images. While a variety of statistical models have been proposed to improve the segmentation results, most of them are either static models (SM),which neglect the temporal dynamics of a cardiac sequence or generic dynamical models (GDM), which are homogeneous in time and neglect the inter-subject variability in cardiac shape and deformation. In this paper, we develop a subject-specific dynamical model (SSDM) that simultaneously handles temporal dynamics (intra-subject variability) and inter-subject variability. We also propose a dynamic prediction algorithm that can progressively identify the specific motion patterns of a new cardiac sequence based on the shapes observed in past frames. The incorporation of this SSDM into the segmentation framework is formulated in a recursive Bayesian framework. It starts with a manual segmentation of the first frame, and then segments each frame according to intensity information from the current frame as well as the prediction from past frames. In addition, to reduce error propagation in sequential segmentation, we take into account the periodic nature of cardiac motion and perform segmentation in both forward and backward directions. We perform “Leave-one-out” test on 32 canine sequences and 22 human sequences, and compare the experimental results with those from SM, GDM, and Active Appearance Motion Model (AAMM). Quantitative analysis of the experimental results shows that SSDM outperforms SM, GDM, and AAMM by having better global and local consistencies with manual segmentation. Moreover, we compare the segmentation results from forward and forward-backward segmentation. Quantitative evaluation shows that forward-backward segmentation suppresses the propagation of segmentation errors.
Cardiac Segmentation; Statistical Shape Model; Dynamical Model; Bayesian Method
To develop an MRI segmentation method for brain tissues, regions, and substructures that yields improved classification accuracy. Current brain segmentation strategies include two complementary strategies: multi-spectral classification and multi-template label fusion with individual strengths and weaknesses.
We propose here a novel multi-classifier fusion algorithm with the advantages of both types of segmentation strategy. We illustrate and validate this algorithm using a group of 14 expertly hand-labeled images.
Our method generated segmentations of cortical and subcortical structures that were more similar to hand-drawn segmentations than majority vote label fusion or a recently published intensity/label fusion method.
We have presented a novel, general segmentation algorithm with the advantages of both statistical classifiers and label fusion techniques.
brain segmentation; classification; multi-classifier fusion
Correct segmentation is critical to many applications within automated microscopy image analysis. Despite the availability of advanced segmentation algorithms, variations in cell morphology, sample preparation, and acquisition settings often lead to segmentation errors. This manuscript introduces a ranked-retrieval approach using logistic regression to automate selection of accurately segmented nuclei from a set of candidate segmentations. The methodology is validated on an application of spatial gene repositioning in breast cancer cell nuclei. Gene repositioning is analyzed in patient tissue sections by labeling sequences with fluorescence in situ hybridization (FISH), followed by measurement of the relative position of each gene from the nuclear center to the nuclear periphery. This technique requires hundreds of well-segmented nuclei per sample to achieve statistical significance. Although the tissue samples in this study contain a surplus of available nuclei, automatic identification of the well-segmented subset remains a challenging task.
Logistic regression was applied to features extracted from candidate segmented nuclei, including nuclear shape, texture, context, and gene copy number, in order to rank objects according to the likelihood of being an accurately segmented nucleus. The method was demonstrated on a tissue microarray dataset of 43 breast cancer patients, comprising approximately 40,000 imaged nuclei in which the HES5 and FRA2 genes were labeled with FISH probes. Three trained reviewers independently classified nuclei into three classes of segmentation accuracy. In man vs. machine studies, the automated method outperformed the inter-observer agreement between reviewers, as measured by area under the receiver operating characteristic (ROC) curve. Robustness of gene position measurements to boundary inaccuracies was demonstrated by comparing 1086 manually and automatically segmented nuclei. Pearson correlation coefficients between the gene position measurements were above 0.9 (p < 0.05). A preliminary experiment was conducted to validate the ranked retrieval in a test to detect cancer. Independent manual measurement of gene positions agreed with automatic results in 21 out of 26 statistical comparisons against a pooled normal (benign) gene position distribution.
Accurate segmentation is necessary to automate quantitative image analysis for applications such as gene repositioning. However, due to heterogeneity within images and across different applications, no segmentation algorithm provides a satisfactory solution. Automated assessment of segmentations by ranked retrieval is capable of reducing or even eliminating the need to select segmented objects by hand and represents a significant improvement over binary classification. The method can be extended to other high-throughput applications requiring accurate detection of cells or nuclei across a range of biomedical applications.