Related Articles
Binarization is often recognized to be one of the most important steps in most high-level image analysis systems, particularly for object recognition. Its precise functioning highly determines the performance of the entire system. According to many researchers, segmentation finishes when the observer’s goal is satisfied. Experience has shown that the most effective methods continue to be the iterative ones. However, a problem with these algorithms is the stopping criterion. In this work, entropy is used as the stopping criterion when segmenting an image by recursively applying mean shift filtering. Of this way, a new algorithm is introduced for the binarization of medical images, where the binarization is carried out after the segmented image was obtained. The good performance of the proposed method; that is, the good quality of the binarization, is illustrated with several experimental results. In this paper a comparison was carried out among the obtained results with this new algorithm with respect to another developed by the author and collaborators previously and also with Otsu’s method.
PMCID: PMC3169934
PMID: 21918602
image segmentation; mean shift; algorithm; entropy; Otsu’s method
Breast lesion segmentation in magnetic resonance (MR) images is one of the most important parts of clinical diagnostic tools. Pixel classification methods have been frequently used in image segmentation with two supervised and unsupervised approaches up to now. Supervised segmentation methods lead to high accuracy, but they need a large amount of labeled data, which is hard, expensive, and slow to be obtained. On the other hand, unsupervised segmentation methods need no prior knowledge and lead to low performance. However, semi-supervised learning which uses not only a few labeled data, but also a large amount of unlabeled data promises higher accuracy with less effort. In this paper, we propose a new interactive semi-supervised approach to segmentation of suspicious lesions in breast MRI. Using a suitable classifier in this approach has an important role in its performance; in this paper, we present a semi-supervised algorithm improved self-training (IMPST) which is an improved version of self-training method and increase segmentation accuracy. Experimental results show that performance of segmentation in this approach is higher than supervised and unsupervised methods such as K nearest neighbors, Bayesian, Support Vector Machine, and Fuzzy c-Means.
PMCID: PMC3342621
PMID: 22606669
Breast lesions segmentation; magnetic resonance imaging; self-training; semi-supervised learning
The authors propose a CT image segmentation method using structural analysis that is useful for objects with structural dynamic characteristics. Motivation of our research is from the area of genetic activity. In order to reveal the roles of genes, it is necessary to create mutant mice and measure differences among them by scanning their skeletons with an X-ray CT scanner. The CT image needs to be manually segmented into pieces of the bones. It is a very time consuming to manually segment many mutant mouse models in order to reveal the roles of genes. It is desirable to make this segmentation procedure automatic. Although numerous papers in the past have proposed segmentation techniques, no general segmentation method for skeletons of living creatures has been established. Against this background, the authors propose a segmentation method based on the concept of destruction analogy. To realize this concept, structural analysis is performed using the finite element method (FEM), as structurally weak areas can be expected to break under conditions of stress. The contribution of the method is its novelty, as no studies have so far used structural analysis for image segmentation. The method's implementation involves three steps. First, finite elements are created directly from the pixels of a CT image, and then candidates are also selected in areas where segmentation is thought to be appropriate. The second step involves destruction analogy to find a single candidate with high strain chosen as the segmentation target. The boundary conditions for FEM are also set automatically. Then, destruction analogy is implemented by replacing pixels with high strain as background ones, and this process is iterated until object is decomposed into two parts. Here, CT image segmentation is demonstrated using various types of CT imagery.
doi:10.1371/journal.pone.0031116
PMCID: PMC3289631
PMID: 22389668
Quantitative microscopy has been extensively used in biomedical research and has provided significant insights into structure and dynamics at the cell and tissue level. The entire procedure of quantitative microscopy is comprised of specimen preparation, light absorption/reflection/emission from the specimen, microscope optical processing, optical/electrical conversion by a camera or detector, and computational processing of digitized images. Although many of the latest digital signal processing techniques have been successfully applied to compress, restore, and register digital microscope images, automated approaches for recognition and understanding of complex subcellular patterns in light microscope images have been far less widely used. In this review, we describe a systematic approach for interpreting protein subcellular distributions using various sets of Subcellular Location Features (SLF) in combination with supervised classification and unsupervised clustering methods. These methods can handle complex patterns in digital microscope images and the features can be applied for other purposes such as objectively choosing a representative image from a collection and performing statistical comparison of image sets.
doi:10.1117/1.1779233
PMCID: PMC1458526
PMID: 15447010
fluorescence microscopy; subcellular location features; pattern recognition; protein distribution comparison; location proteomics; protein localization
The accurate measurement of cell and nuclei contours are critical for the sensitive and specific detection of changes in normal cells in several medical informatics disciplines. Within microscopy, this task is facilitated using fluorescence cell stains, and segmentation is often the first step in such approaches. Due to the complex nature of cell issues and problems inherent to microscopy, unsupervised mining approaches of clustering can be incorporated in the segmentation of cells. In this study, we have developed and evaluated the performance of multiple unsupervised data mining techniques in cell image segmentation. We adapt four distinctive, yet complementary, methods for unsupervised learning, including those based on k-means clustering, EM, Otsu’s threshold, and GMAC. Validation measures are defined, and the performance of the techniques is evaluated both quantitatively and qualitatively using synthetic and recently published real data. Experimental results demonstrate that k-means, Otsu’s threshold, and GMAC perform similarly, and have more precise segmentation results than EM. We report that EM has higher recall values and lower precision results from under-segmentation due to its Gaussian model assumption. We also demonstrate that these methods need spatial information to segment complex real cell images with a high degree of efficacy, as expected in many medical informatics applications.
doi:10.2174/1874431101004020041
PMCID: PMC2930152
PMID: 21116323
Fluorescence microscope cell image; segmentation; K-means clustering; EM; threshold; GMAC.
Background
A biomedical entity mention in articles and other free texts is often ambiguous. For example, 13% of the gene names (aliases) might refer to more than one gene. The task of Gene Symbol Disambiguation (GSD) – a special case of Word Sense Disambiguation (WSD) – is to assign a unique gene identifier for all identified gene name aliases in biology-related articles. Supervised and unsupervised machine learning WSD techniques have been applied in the biomedical field with promising results. We examine here the utilisation potential of the fact – one of the special features of biological articles – that the authors of the documents are known through graph-based semi-supervised methods for the GSD task.
Results
Our key hypothesis is that a biologist refers to each particular gene by a fixed gene alias and this holds for the co-authors as well. To make use of the co-authorship information we decided to build the inverse co-author graph on MedLine abstracts. The nodes of the inverse co-author graph are articles and there is an edge between two nodes if and only if the two articles have a mutual author. We introduce here two methods using distances (based on the graph) of abstracts for the GSD task. We found that a disambiguation decision can be made in 85% of cases with an extremely high (99.5%) precision rate just by using information obtained from the inverse co-author graph. We incorporated the co-authorship information into two GSD systems in order to attain full coverage and in experiments our procedure achieved precision of 94.3%, 98.85%, 96.05% and 99.63% on the human, mouse, fly and yeast GSD evaluation sets, respectively.
Conclusion
Based on the promising results obtained so far we suggest that the co-authorship information and the circumstances of the articles' release (like the title of the journal, the year of publication) can be a crucial building block of any sophisticated similarity measure among biological articles and hence the methods introduced here should be useful for other biomedical natural language processing tasks (like organism or target disease detection) as well.
doi:10.1186/1471-2105-9-69
PMCID: PMC2262057
PMID: 18230174
The number of distinct foods consumed in a meal is of significant clinical concern in the study of obesity and other eating disorders. This paper proposes the use of information contained in chewing and swallowing sequences for meal segmentation by food types. Data collected from experiments of 17 volunteers were analyzed using two different clustering techniques. First, an unsupervised clustering technique, Affinity Propagation (AP), was used to automatically identify the number of segments within a meal. Second, performance of the unsupervised AP method was compared to a supervised learning approach based on Agglomerative Hierarchical Clustering (AHC). While the AP method was able to obtain 90% accuracy in predicting the number of food items, the AHC achieved an accuracy >95%. Experimental results suggest that the proposed models of automatic meal segmentation may be utilized as part of an integral application for objective Monitoring of Ingestive Behavior in free living conditions.
doi:10.1016/j.bspc.2011.11.004
PMCID: PMC3484889
PMID: 23125872
Monitoring of Ingestive Behavior; Food intake; Clustering; Affinity Propagation; Agglomerative Hierarchical Clustering
Background
Feature selection is an important pre-processing task in the analysis of complex data. Selecting an appropriate subset of features can improve classification or clustering and lead to better understanding of the data. An important example is that of finding an informative group of genes out of thousands that appear in gene-expression analysis. Numerous supervised methods have been suggested but only a few unsupervised ones exist. Unsupervised Feature Filtering (UFF) is such a method, based on an entropy measure of Singular Value Decomposition (SVD), ranking features and selecting a group of preferred ones.
Results
We analyze the statistical properties of UFF and present an efficient approximation for the calculation of its entropy measure. This allows us to develop a web-tool that implements the UFF algorithm. We propose novel criteria to indicate whether a considered dataset is amenable to feature selection by UFF. Relying on formalism similar to UFF we propose also an Unsupervised Detection of Outliers (UDO) method, providing a novel definition of outliers and producing a measure to rank the "outlier-degree" of an instance.
Our methods are demonstrated on gene and microRNA expression datasets, covering viral infection disease and cancer. We apply UFFizi to select genes from these datasets and discuss their biological and medical relevance.
Conclusions
Statistical properties extracted from the UFF algorithm can distinguish selected features from others. UFFizi is a framework that is based on the UFF algorithm and it is applicable for a wide range of diseases. The framework is also implemented as a web-tool.
The web-tool is available at: http://adios.tau.ac.il/UFFizi
doi:10.1186/1471-2105-11-300
PMCID: PMC2893168
PMID: 20525252
Segmentation is a fundamental component of many medical image processing applications, and it has long been recognized as a challenging problem. In this paper, we report our research and development efforts on analyzing and extracting clinically meaningful regions from uterine cervix images in a large database created for the study of cervical cancer. In addition to proposing new algorithms, we also focus on developing open source tools which are in synchrony with the research objectives. These efforts have resulted in three Web-accessible tools which address three important and interrelated sub-topics in medical image segmentation, respectively: the BMT (Boundary Marking Tool), CST (Cervigram Segmentation Tool), and MOSES (Multi-Observer Segmentation Evaluation System). The BMT is for manual segmentation, typically to collect “ground truth” image regions from medical experts. The CST is for automatic segmentation, and MOSES is for segmentation evaluation. These tools are designed to be a unified set in which data can be conveniently exchanged. They have value not only for improving the reliability and accuracy of algorithms of uterine cervix image segmentation, but also promoting collaboration between biomedical experts and engineers which are crucial to medical image processing applications. Although the CST is designed for the unique characteristics of cervigrams, the BMT and MOSES are very general and extensible, and can be easily adapted to other biomedical image collections.
doi:10.1016/j.compmedimag.2010.04.002
PMCID: PMC2955170
PMID: 20510585
The ability to automatically segment an image into distinct regions is a critical aspect in many visual processing applications. Because inaccuracies often exist in automatic segmentation, manual segmentation is necessary in some application domains to correct mistakes, such as required in the reconstruction of neuronal processes from microscopic images. The goal of the automated segmentation tool is traditionally to produce the highest-quality segmentation, where quality is measured by the similarity to actual ground truth, so as to minimize the volume of manual correction necessary. Manual correction is generally orders-of-magnitude more time consuming than automated segmentation, often making handling large images intractable. Therefore, we propose a more relevant goal: minimizing the turn-around time of automated/manual segmentation while attaining a level of similarity with ground truth. It is not always necessary to inspect every aspect of an image to generate a useful segmentation. As such, we propose a strategy to guide manual segmentation to the most uncertain parts of segmentation. Our contributions include 1) a probabilistic measure that evaluates segmentation without ground truth and 2) a methodology that leverages these probabilistic measures to significantly reduce manual correction while maintaining segmentation quality.
doi:10.1371/journal.pone.0044448
PMCID: PMC3448604
PMID: 23028540
Anomaly detection involves identifying rare data instances (anomalies) that come from a different class or distribution than the majority (which are simply called “normal” instances). Given a training set of only normal data, the semi-supervised anomaly detection task is to identify anomalies in the future. Good solutions to this task have applications in fraud and intrusion detection. The unsupervised anomaly detection task is different: Given unlabeled, mostly-normal data, identify the anomalies among them. Many real-world machine learning tasks, including many fraud and intrusion detection tasks, are unsupervised because it is impractical (or impossible) to verify all of the training data. We recently presented FRaC, a new approach for semi-supervised anomaly detection. FRaC is based on using normal instances to build an ensemble of feature models, and then identifying instances that disagree with those models as anomalous. In this paper, we investigate the behavior of FRaC experimentally and explain why FRaC is so successful. We also show that FRaC is a superior approach for the unsupervised as well as the semi-supervised anomaly detection task, compared to well-known state-of-the-art anomaly detection methods, LOF and one-class support vector machines, and to an existing feature-modeling approach.
doi:10.1007/s10618-011-0234-x
PMCID: PMC3359096
PMID: 22639542
Anomaly detection; Unsupervised learning
There are many instances in genetics in which we wish to determine whether two
candidate populations are distinguishable on the basis of their genetic
structure. Examples include populations which are geographically separated,
case–control studies and quality control (when participants in a study
have been genotyped at different laboratories). This latter application is of
particular importance in the era of large scale genome wide association studies,
when collections of individuals genotyped at different locations are being
merged to provide increased power. The traditional method for detecting
structure within a population is some form of exploratory technique such as
principal components analysis. Such methods, which do not utilise our prior
knowledge of the membership of the candidate populations. are termed
unsupervised. Supervised methods, on the other hand are
able to utilise this prior knowledge when it is available.
In this paper we demonstrate that in such cases modern supervised approaches are
a more appropriate tool for detecting genetic differences between populations.
We apply two such methods, (neural networks and support vector machines) to the
classification of three populations (two from Scotland and one from Bulgaria).
The sensitivity exhibited by both these methods is considerably higher than that
attained by principal components analysis and in fact comfortably exceeds a
recently conjectured theoretical limit on the sensitivity of unsupervised
methods. In particular, our methods can distinguish between the two Scottish
populations, where principal components analysis cannot. We suggest, on the
basis of our results that a supervised learning approach should be the method of
choice when classifying individuals into pre-defined populations, particularly
in quality control for large scale genome wide association studies.
doi:10.1371/journal.pone.0014802
PMCID: PMC3093382
PMID: 21589856
Background
Various statistical scores have been proposed for evaluating the significance of genes that may exhibit differential expression between two or more controlled conditions. However, in many clinical studies to detect clinical marker genes for example, the conditions have not necessarily been controlled well, thus condition labels are sometimes hard to obtain due to physical, financial, and time costs. In such a situation, we can consider an unsupervised case where labels are not available or a semi-supervised case where labels are available for a part of the whole sample set, rather than a well-studied supervised case where all samples have their labels.
Results
We assume a latent variable model for the expression of active genes and apply the optimal discovery procedure (ODP) proposed by Storey (2005) to the model. Our latent variable model allows gene significance scores to be applied to unsupervised and semi-supervised cases. The ODP framework improves detectability by sharing the estimated parameters of null and alternative models of multiple tests over multiple genes. A theoretical consideration leads to two different interpretations of the latent variable, i.e., it only implicitly affects the alternative model through the model parameters, or it is explicitly included in the alternative model, so that the interpretations correspond to two different implementations of ODP. By comparing the two implementations through experiments with simulation data, we have found that sharing the latent variable estimation is effective for increasing the detectability of truly active genes. We also show that the unsupervised and semi-supervised rating of genes, which takes into account the samples without condition labels, can improve detection of active genes in real gene discovery problems.
Conclusion
The experimental results indicate that the ODP framework is effective for hypotheses including latent variables and is further improved by sharing the estimations of hidden variables over multiple tests.
doi:10.1186/1471-2105-7-414
PMCID: PMC1584253
PMID: 16981994
In this study, we present pituitary adenoma volumetry using the free and open source medical image computing platform for biomedical research: (3D) Slicer. Volumetric changes in cerebral pathologies like pituitary adenomas are a critical factor in treatment decisions by physicians and in general the volume is acquired manually. Therefore, manual slice-by-slice segmentations in magnetic resonance imaging (MRI) data, which have been obtained at regular intervals, are performed. In contrast to this manual time consuming slice-by-slice segmentation process Slicer is an alternative which can be significantly faster and less user intensive. In this contribution, we compare pure manual segmentations of ten pituitary adenomas with semi-automatic segmentations under Slicer. Thus, physicians drew the boundaries completely manually on a slice-by-slice basis and performed a Slicer-enhanced segmentation using the competitive region-growing based module of Slicer named GrowCut. Results showed that the time and user effort required for GrowCut-based segmentations were on average about thirty percent less than the pure manual segmentations. Furthermore, we calculated the Dice Similarity Coefficient (DSC) between the manual and the Slicer-based segmentations to proof that the two are comparable yielding an average DSC of 81.97±3.39%.
doi:10.1371/journal.pone.0051788
PMCID: PMC3519899
PMID: 23240062
Image segmentation is important with applications to several problems in biology and medicine. While extensively researched, generally, current segmentation methods perform adequately in the applications for which they were designed, but often require extensive modifications or calibrations before being used in a different application. We describe an approach that, with few modifications, can be used in a variety of image segmentation problems. The approach is based on a supervised learning strategy that utilizes intensity neighborhoods to assign each pixel in a test image its correct class based on training data. We describe methods for modeling rotations and variations in scales as well as a subset selection for training the classifiers. We show that the performance of our approach in tissue segmentation tasks in magnetic resonance and histopathology microscopy images, as well as nuclei segmentation from fluorescence microscopy images, is similar to or better than several algorithms specifically designed for each of these applications.
doi:10.1155/2011/606857
PMCID: PMC3132524
PMID: 21760767
The simultaneous segmentation of multiple objects is an important problem in many imaging and computer vision applications. Various extensions of level set segmentation techniques to multiple objects have been proposed; however, no one method maintains object relationships, preserves topology, is computationally efficient, and provides an object-dependent internal and external force capability. In this paper, a framework for segmenting multiple objects that permits different forces to be applied to different boundaries while maintaining object topology and relationships is presented. Because of this framework, the segmentation of multiple objects each with multiple compartments is supported, and no overlaps or vacuums are generated. The computational complexity of this approach is independent of the number of objects to segment, thereby permitting the simultaneous segmentation of a large number of components. The properties of this approach and comparisons to existing methods are shown using a variety of images, both synthetic and real.
doi:10.1109/CVPR.2008.4587475
PMCID: PMC3516193
PMID: 23223164
Purpose
To develop and validate a multi-dimensional segmentation and filtering methodology for accurate blood flow velocity field reconstruction from phase contrast magnetic resonance imaging (PC MRI).
Materials and Methods
The proposed technique consists of two steps: 1) the boundary of the vessel is automatically segmented using the active contour approach; 2) noise embedded within the segmented vector field is selectively removed using a novel fuzzy adaptive vector median filtering (FAVMF) technique. This two-step segmentation process is tested and validated on 111 synthetically generated PC MRI slices and on 10 patients with congenital heart disease.
Results
The active contour technique was effective for segmenting blood vessels having a sensitivity and specificity of 93.1 and 92.1 % using manual segmentation as a reference standard. FAVMF was the superior technique in filtering out noise vectors, when compared to other commonly used filters in PC MRI (p <0.05). The peak wall shear rate calculated from the PC MRI data (248 ± 39 s−1), was significantly decreased to (146 ± 26 s−1) after the filtering process.
Conclusion
The proposed two step segmentation and filtering methodology is more accurate compared to a single step segmentation process for post processing of PC MRI data.
doi:10.1002/jmri.21579
PMCID: PMC2614292
PMID: 19097101
PC MRI; Noise Filtering; Fuzzy Systems; Vector Median Filtering; Segmentation; Active Contours
Multi-atlas segmentation has been widely applied in medical image analysis. With deformable registration, this technique realizes label transfer from pre-labeled atlases to unknown images. When deformable registration produces error, label fusion that combines results produced by multiple atlases is an effective way for reducing segmentation errors. Among the existing label fusion strategies, similarity-weighted voting strategies with spatially varying weight distributions have been particularly successful. We show that, weighted voting based label fusion produces a spatial bias that under-segments structures with convex shapes. The bias can be approximated as applying spatial convolution to the ground truth spatial label probability maps, where the convolution kernel combines the distribution of residual registration errors and the function producing similarity-based voting weights. To reduce this bias, we apply a standard spatial deconvolution to the spatial probability maps obtained from weighted voting. In a brain image segmentation experiment, we demonstrate the spatial bias and show that our technique substantially reduces this spatial bias.
doi:10.1109/CVPR.2012.6247765
PMCID: PMC3589983
PMID: 23476901
Statistical models have shown considerable promise as a basis for segmenting and interpreting cardiac images. While a variety of statistical models have been proposed to improve the segmentation results, most of them are either static models (SM),which neglect the temporal dynamics of a cardiac sequence or generic dynamical models (GDM), which are homogeneous in time and neglect the inter-subject variability in cardiac shape and deformation. In this paper, we develop a subject-specific dynamical model (SSDM) that simultaneously handles temporal dynamics (intra-subject variability) and inter-subject variability. We also propose a dynamic prediction algorithm that can progressively identify the specific motion patterns of a new cardiac sequence based on the shapes observed in past frames. The incorporation of this SSDM into the segmentation framework is formulated in a recursive Bayesian framework. It starts with a manual segmentation of the first frame, and then segments each frame according to intensity information from the current frame as well as the prediction from past frames. In addition, to reduce error propagation in sequential segmentation, we take into account the periodic nature of cardiac motion and perform segmentation in both forward and backward directions. We perform “Leave-one-out” test on 32 canine sequences and 22 human sequences, and compare the experimental results with those from SM, GDM, and Active Appearance Motion Model (AAMM). Quantitative analysis of the experimental results shows that SSDM outperforms SM, GDM, and AAMM by having better global and local consistencies with manual segmentation. Moreover, we compare the segmentation results from forward and forward-backward segmentation. Quantitative evaluation shows that forward-backward segmentation suppresses the propagation of segmentation errors.
doi:10.1109/TMI.2009.2031063
PMCID: PMC2832728
PMID: 19789107
Cardiac Segmentation; Statistical Shape Model; Dynamical Model; Bayesian Method
Background
Correct segmentation is critical to many applications within automated microscopy image analysis. Despite the availability of advanced segmentation algorithms, variations in cell morphology, sample preparation, and acquisition settings often lead to segmentation errors. This manuscript introduces a ranked-retrieval approach using logistic regression to automate selection of accurately segmented nuclei from a set of candidate segmentations. The methodology is validated on an application of spatial gene repositioning in breast cancer cell nuclei. Gene repositioning is analyzed in patient tissue sections by labeling sequences with fluorescence in situ hybridization (FISH), followed by measurement of the relative position of each gene from the nuclear center to the nuclear periphery. This technique requires hundreds of well-segmented nuclei per sample to achieve statistical significance. Although the tissue samples in this study contain a surplus of available nuclei, automatic identification of the well-segmented subset remains a challenging task.
Results
Logistic regression was applied to features extracted from candidate segmented nuclei, including nuclear shape, texture, context, and gene copy number, in order to rank objects according to the likelihood of being an accurately segmented nucleus. The method was demonstrated on a tissue microarray dataset of 43 breast cancer patients, comprising approximately 40,000 imaged nuclei in which the HES5 and FRA2 genes were labeled with FISH probes. Three trained reviewers independently classified nuclei into three classes of segmentation accuracy. In man vs. machine studies, the automated method outperformed the inter-observer agreement between reviewers, as measured by area under the receiver operating characteristic (ROC) curve. Robustness of gene position measurements to boundary inaccuracies was demonstrated by comparing 1086 manually and automatically segmented nuclei. Pearson correlation coefficients between the gene position measurements were above 0.9 (p < 0.05). A preliminary experiment was conducted to validate the ranked retrieval in a test to detect cancer. Independent manual measurement of gene positions agreed with automatic results in 21 out of 26 statistical comparisons against a pooled normal (benign) gene position distribution.
Conclusions
Accurate segmentation is necessary to automate quantitative image analysis for applications such as gene repositioning. However, due to heterogeneity within images and across different applications, no segmentation algorithm provides a satisfactory solution. Automated assessment of segmentations by ranked retrieval is capable of reducing or even eliminating the need to select segmented objects by hand and represents a significant improvement over binary classification. The method can be extended to other high-throughput applications requiring accurate detection of cells or nuclei across a range of biomedical applications.
doi:10.1186/1471-2105-13-232
PMCID: PMC3484015
PMID: 22971117
We propose a simple but generally applicable approach to improving the accuracy of automatic image segmentation algorithms relative to manual segmentations. The approach is based on the hypothesis that a large fraction of the errors produced by automatic segmentation are systematic, i.e., occur consistently from subject to subject, and serves as a wrapper method around a given host segmentation method. The wrapper method attempts to learn the intensity, spatial and contextual patterns associated with systematic segmentation errors produced by the host method on training data for which manual segmentations are available. The method then attempts to correct such errors in segmentations produced by the host method on new images. One practical use of the proposed wrapper method is to adapt existing segmentation tools, without explicit modification, to imaging data and segmentation protocols that are different from those on which the tools were trained and tuned. An open-source implementation of the proposed wrapper method is provided, and can be applied to a wide range of image segmentation problems.
The wrapper method is evaluated with four host brain MRI segmentation methods: hippocampus segmentation using FreeSurfer (Fischl et al., 2002); hippocampus segmentation using multi-atlas label fusion (Artaechevarria et al., 2009); brain extraction using BET (Smith, 2002); and brain tissue segmentation using FAST (Zhang et al., 2001). The wrapper method generates 72%, 14%, 29% and 21% fewer erroneously segmented voxels than the respective host segmentation methods. In the hippocampus segmentation experiment with multi-atlas label fusion as the host method, the average Dice overlap between reference segmentations and segmentations produced by the wrapper method is 0.908 for normal controls and 0.893 for patients with mild cognitive impairment. Average Dice overlaps of 0.964, 0.905 and 0.951 are obtained for brain extraction, white matter segmentation and gray matter segmentation, respectively.
doi:10.1016/j.neuroimage.2011.01.006
PMCID: PMC3049832
PMID: 21237273
medical image segmentation; error correction; AdaBoost; hippocampal segmentation; brain extraction; brain tissue segmentation
Proteomics, the large scale identification and characterization of many or all proteins expressed in a given cell type, has become a major area of biological research. In addition to information on protein sequence, structure and expression levels, knowledge of a protein’s subcellular location is essential to a complete understanding of its functions. Currently subcellular location patterns are routinely determined by visual inspection of fluorescence microscope images. We review here research aimed at creating systems for automated, systematic determination of location. These employ numerical feature extraction from images, feature reduction to identify the most useful features, and various supervised learning (classification) and unsupervised learning (clustering) methods. These methods have been shown to perform significantly better than human interpretation of the same images. When coupled with technologies for tagging large numbers of proteins and high-throughput microscope systems, the computational methods reviewed here enable the new subfield of location proteomics. This subfield will make critical contributions in two related areas. First, it will provide structured, high-resolution information on location to enable Systems Biology efforts to simulate cell behavior from the gene level on up. Second, it will provide tools for Cytomics projects aimed at characterizing the behaviors of all cell types before, during and after the onset of various diseases.
doi:10.1002/cyto.a.20280
PMCID: PMC2901544
PMID: 16752421
Subcellular Location Trees; Subcellular Location Features; Pattern Recognition; Fluorescence Microscopy; Location Proteomics; Cluster Analysis
In order to evaluate the quality of segmentations of an image and assess intra- and inter-expert variability in segmentation performance, an Expectation Maximization (EM) algorithm for Simultaneous Truth And Performance Level Estimation (STAPLE) was recently developed. This algorithm, originally presented for segmentation validation, has since been used for many applications, such as atlas construction and decision fusion. However, the manual delineation of structures of interest is a very time consuming and burdensome task. Further, as the time required and burden of manual delineation increase, the accuracy of the delineation is decreased. Therefore, it may be desirable to ask the experts to delineate only a reduced number of structures or the segmentation of all structures by all experts may simply not be achieved. Fusion from data with some structures not segmented by each expert should be carried out in a manner that accounts for the missing information. In other applications, locally inconsistent segmentations may drive the STAPLE algorithm into an undesirable local optimum, leading to misclassifications or misleading experts performance parameters.
We present a new algorithm that allows fusion with partial delineation and which can avoid convergence to undesirable local optima in the presence of strongly inconsistent segmentations. The algorithm extends STAPLE by incorporating prior probabilities for the expert performance parameters. This is achieved through a Maximum A Posteriori formulation, where the prior probabilities for the performance parameters are modeled by a beta distribution. We demonstrate that this new algorithm enables dramatically improved fusion from data with partial delineation by each expert in comparison to fusion with STAPLE.
PMCID: PMC3079852
PMID: 20879379
Adult; Algorithms; Brain; anatomy & histology; Data Interpretation, Statistical; Expert Systems; Female; Humans; Image Enhancement; methods; Image Interpretation, Computer-Assisted; methods; Information Storage and Retrieval; methods; Likelihood Functions; Magnetic Resonance Imaging; methods; Male; Observer Variation; Pattern Recognition, Automated; methods; Reproducibility of Results; Sensitivity and Specificity; Subtraction Technique
3D image reconstruction of large cellular volumes by electron tomography (ET) at high (≤5 nm) resolution can now routinely resolve organellar and compartmental membrane structures, protein coats, cytoskeletal filaments, and macromolecules. However, current image analysis methods for identifying in situ macromolecular structures within the crowded 3D ultrastructural landscape of a cell remain labor-intensive, time-consuming, and prone to user-bias and/or error. This paper demonstrates the development and application of a parameter-free, 3D implementation of the bilateral edge-detection (BLE) algorithm for the rapid and accurate segmentation of cellular tomograms. The performance of the 3D BLE filter has been tested on a range of synthetic and real biological data sets and validated against current leading filters—the pseudo 3D recursive and Canny filters. The performance of the 3D BLE filter was found to be comparable to or better than that of both the 3D recursive and Canny filters while offering the significant advantage that it requires no parameter input or optimisation. Edge widths as little as 2 pixels are reproducibly detected with signal intensity and grey scale values as low as 0.72% above the mean of the background noise. The 3D BLE thus provides an efficient method for the automated segmentation of complex cellular structures across multiple scales for further downstream processing, such as cellular annotation and sub-tomogram averaging, and provides a valuable tool for the accurate and high-throughput identification and annotation of 3D structural complexity at the subcellular level, as well as for mapping the spatial and temporal rearrangement of macromolecular assemblies in situ within cellular tomograms.
doi:10.1371/journal.pone.0033697
PMCID: PMC3315577
PMID: 22479430
The massively parallel sequencing technologies have recently flourished and dramatically cut the cost to sequence personal human genomes. Haplotype assembly from personal genomes sequenced using the massively parallel sequencing technologies is becoming a cost-effective and promising tool for human disease study. Computational assembly of haplotypes has been proved to be very accurate, but obviously contains errors. Here we present a tool, HapEdit, to assess the accuracy of assembled haplotypes and edit them manually. Using this tool, a user can break erroneous haplotype segments into smaller segments, or concatenate haplotype segments if the concatenated haplotype segments are sufficiently supported. A user can also edit bases with low-quality scores. HapEdit displays haplotype assemblies so that a user can easily navigate and pinpoint a region of interest. As inputs, HapEdit currently takes reads from the Polonator, Illumina, SOLiD, 454 and Sanger sequencing technologies.
doi:10.1093/nar/gkr354
PMCID: PMC3125762
PMID: 21576217