Search tips
Search criteria

Results 1-25 (1442188)

Clipboard (0)

Related Articles

1.  The Role of the Toxicologic Pathologist in the Post-Genomic Era# 
Journal of Toxicologic Pathology  2013;26(2):105-110.
An era can be defined as a period in time identified by distinctive character, events, or practices. We are now in the genomic era. The pre-genomic era: There was a pre-genomic era. It started many years ago with novel and seminal animal experiments, primarily directed at studying cancer. It is marked by the development of the two-year rodent cancer bioassay and the ultimate realization that alternative approaches and short-term animal models were needed to replace this resource-intensive and time-consuming method for predicting human health risk. Many alternatives approaches and short-term animal models were proposed and tried but, to date, none have completely replaced our dependence upon the two-year rodent bioassay. However, the alternative approaches and models themselves have made tangible contributions to basic research, clinical medicine and to our understanding of cancer and they remain useful tools to address hypothesis-driven research questions. The pre-genomic era was a time when toxicologic pathologists played a major role in drug development, evaluating the cancer bioassay and the associated dose-setting toxicity studies, and exploring the utility of proposed alternative animal models. It was a time when there was shortage of qualified toxicologic pathologists. The genomic era: We are in the genomic era. It is a time when the genetic underpinnings of normal biological and pathologic processes are being discovered and documented. It is a time for sequencing entire genomes and deliberately silencing relevant segments of the mouse genome to see what each segment controls and if that silencing leads to increased susceptibility to disease. What remains to be charted in this genomic era is the complex interaction of genes, gene segments, post-translational modifications of encoded proteins, and environmental factors that affect genomic expression. In this current genomic era, the toxicologic pathologist has had to make room for a growing population of molecular biologists. In this present era newly emerging DVM and MD scientists enter the work arena with a PhD in pathology often based on some aspect of molecular biology or molecular pathology research. In molecular biology, the almost daily technological advances require one’s complete dedication to remain at the cutting edge of the science. Similarly, the practice of toxicologic pathology, like other morphological disciplines, is based largely on experience and requires dedicated daily examination of pathology material to maintain a well-trained eye capable of distilling specific information from stained tissue slides - a dedicated effort that cannot be well done as an intermezzo between other tasks. It is a rare individual that has true expertise in both molecular biology and pathology. In this genomic era, the newly emerging DVM-PhD or MD-PhD pathologist enters a marketplace without many job opportunities in contrast to the pre-genomic era. Many face an identity crisis needing to decide to become a competent pathologist or, alternatively, to become a competent molecular biologist. At the same time, more PhD molecular biologists without training in pathology are members of the research teams working in drug development and toxicology. How best can the toxicologic pathologist interact in the contemporary team approach in drug development, toxicology research and safety testing? Based on their biomedical training, toxicologic pathologists are in an ideal position to link data from the emerging technologies with their knowledge of pathobiology and toxicology. To enable this linkage and obtain the synergy it provides, the bench-level, slide-reading expert pathologist will need to have some basic understanding and appreciation of molecular biology methods and tools. On the other hand, it is not likely that the typical molecular biologist could competently evaluate and diagnose stained tissue slides from a toxicology study or a cancer bioassay. The post-genomic era: The post-genomic era will likely arrive approximately around 2050 at which time entire genomes from multiple species will exist in massive databases, data from thousands of robotic high throughput chemical screenings will exist in other databases, genetic toxicity and chemical structure-activity-relationships will reside in yet other databases. All databases will be linked and relevant information will be extracted and analyzed by appropriate algorithms following input of the latest molecular, submolecular, genetic, experimental, pathology and clinical data. Knowledge gained will permit the genetic components of many diseases to be amenable to therapeutic prevention and/or intervention. Much like computerized algorithms are currently used to forecast weather or to predict political elections, computerized sophisticated algorithms based largely on scientific data mining will categorize new drugs and chemicals relative to their health benefits versus their health risks for defined human populations and subpopulations. However, this form of a virtual toxicity study or cancer bioassay will only identify probabilities of adverse consequences from interaction of particular environmental and/or chemical/drug exposure(s) with specific genomic variables. Proof in many situations will require confirmation in intact in vivo mammalian animal models. The toxicologic pathologist in the post-genomic era will be the best suited scientist to confirm the data mining and its probability predictions for safety or adverse consequences with the actual tissue morphological features in test species that define specific test agent pathobiology and human health risk.
PMCID: PMC3695332  PMID: 23914052
genomic era; history of toxicologic pathology; molecular biology
2.  A vocabulary for the identification and delineation of teratoma tissue components in hematoxylin and eosin-stained samples 
We propose a methodology for the design of features mimicking the visual cues used by pathologists when identifying tissues in hematoxylin and eosin (H&E)-stained samples.
H&E staining is the gold standard in clinical histology; it is cheap and universally used, producing a vast number of histopathological samples. While pathologists accurately and consistently identify tissues and their pathologies, it is a time-consuming and expensive task, establishing the need for automated algorithms for improved throughput and robustness.
We use an iterative feedback process to design a histopathology vocabulary (HV), a concise set of features that mimic the visual cues used by pathologists, e.g. “cytoplasm color” or “nucleus density”. These features are based in histology and understood by both pathologists and engineers. We compare our HV to several generic texture-feature sets in a pixel-level classification algorithm.
Results on delineating and identifying tissues in teratoma tumor samples validate our expert knowledge-based approach.
The HV can be an effective tool for identifying and delineating teratoma components from images of H&E-stained tissue samples.
PMCID: PMC4141425  PMID: 25191619
Automated histology; classification; segmentation
3.  Automated detection of regions of interest for tissue microarray experiments: an image texture analysis 
Recent research with tissue microarrays led to a rapid progress toward quantifying the expressions of large sets of biomarkers in normal and diseased tissue. However, standard procedures for sampling tissue for molecular profiling have not yet been established.
This study presents a high throughput analysis of texture heterogeneity on breast tissue images for the purpose of identifying regions of interest in the tissue for molecular profiling via tissue microarray technology. Image texture of breast histology slides was described in terms of three parameters: the percentage of area occupied in an image block by chromatin (B), percentage occupied by stroma-like regions (P), and a statistical heterogeneity index H commonly used in image analysis. Texture parameters were defined and computed for each of the thousands of image blocks in our dataset using both the gray scale and color segmentation. The image blocks were then classified into three categories using the texture feature parameters in a novel statistical learning algorithm. These categories are as follows: image blocks specific to normal breast tissue, blocks specific to cancerous tissue, and those image blocks that are non-specific to normal and disease states.
Gray scale and color segmentation techniques led to identification of same regions in histology slides as cancer-specific. Moreover the image blocks identified as cancer-specific belonged to those cell crowded regions in whole section image slides that were marked by two pathologists as regions of interest for further histological studies.
These results indicate the high efficiency of our automated method for identifying pathologic regions of interest on histology slides. Automation of critical region identification will help minimize the inter-rater variability among different raters (pathologists) as hundreds of tumors that are used to develop an array have typically been evaluated (graded) by different pathologists. The region of interest information gathered from the whole section images will guide the excision of tissue for constructing tissue microarrays and for high throughput profiling of global gene expression.
PMCID: PMC1838905  PMID: 17349041
4.  Automated tissue characterization of in vivo atherosclerotic plaques by intravascular optical coherence tomography images 
Biomedical Optics Express  2013;4(7):1014-1030.
Intravascular optical coherence tomography (IVOCT) is rapidly becoming the method of choice for the in vivo investigation of coronary artery disease. While IVOCT visualizes atherosclerotic plaques with a resolution <20µm, image analysis in terms of tissue composition is currently performed by a time-consuming manual procedure based on the qualitative interpretation of image features. We illustrate an algorithm for the automated and systematic characterization of IVOCT atherosclerotic tissue. The proposed method consists in a supervised classification of image pixels according to textural features combined with the estimated value of the optical attenuation coefficient. IVOCT images of 64 plaques, from 49 in vivo IVOCT data sets, constituted the algorithm’s training and testing data sets. Validation was obtained by comparing automated analysis results to the manual assessment of atherosclerotic plaques. An overall pixel-wise accuracy of 81.5% with a classification feasibility of 76.5% and per-class accuracy of 89.5%, 72.1% and 79.5% for fibrotic, calcified and lipid-rich tissue respectively, was found. Moreover, measured optical properties were in agreement with previous results reported in literature. As such, an algorithm for automated tissue characterization was developed and validated using in vivo human data, suggesting that it can be applied to clinical IVOCT data. This might be an important step towards the integration of IVOCT in cardiovascular research and routine clinical practice.
PMCID: PMC3704084  PMID: 23847728
(100.0100) Image processing; (100.2960) Image analysis; (100.4995) Pattern recognition, metrics; (170.0170) Medical optics and biotechnology; (170.6935) Tissue characterization
5.  Automated prostate cancer detection via comprehensive multi-parametric magnetic resonance imaging texture feature models 
BMC Medical Imaging  2015;15:27.
Prostate cancer is the most common form of cancer and the second leading cause of cancer death in North America. Auto-detection of prostate cancer can play a major role in early detection of prostate cancer, which has a significant impact on patient survival rates. While multi-parametric magnetic resonance imaging (MP-MRI) has shown promise in diagnosis of prostate cancer, the existing auto-detection algorithms do not take advantage of abundance of data available in MP-MRI to improve detection accuracy. The goal of this research was to design a radiomics-based auto-detection method for prostate cancer via utilizing MP-MRI data.
In this work, we present new MP-MRI texture feature models for radiomics-driven detection of prostate cancer. In addition to commonly used non-invasive imaging sequences in conventional MP-MRI, namely T2-weighted MRI (T2w) and diffusion-weighted imaging (DWI), our proposed MP-MRI texture feature models incorporate computed high-b DWI (CHB-DWI) and a new diffusion imaging modality called correlated diffusion imaging (CDI). Moreover, the proposed texture feature models incorporate features from individual b-value images. A comprehensive set of texture features was calculated for both the conventional MP-MRI and new MP-MRI texture feature models. We performed feature selection analysis for each individual modality and then combined best features from each modality to construct the optimized texture feature models.
The performance of the proposed MP-MRI texture feature models was evaluated via leave-one-patient-out cross-validation using a support vector machine (SVM) classifier trained on 40,975 cancerous and healthy tissue samples obtained from real clinical MP-MRI datasets. The proposed MP-MRI texture feature models outperformed the conventional model (i.e., T2w+DWI) with regard to cancer detection accuracy.
Comprehensive texture feature models were developed for improved radiomics-driven detection of prostate cancer using MP-MRI. Using a comprehensive set of texture features and a feature selection method, optimal texture feature models were constructed that improved the prostate cancer auto-detection significantly compared to conventional MP-MRI texture feature models.
PMCID: PMC4524105  PMID: 26242589
6.  Evaluation of position-estimation methods applied to CZT-based photon-counting detectors for dedicated breast CT 
Journal of Medical Imaging  2015;2(2):023501.
Semiconductor photon-counting detectors based on high atomic number, high density materials [cadmium zinc telluride (CZT)/cadmium telluride (CdTe)] for x-ray computed tomography (CT) provide advantages over conventional energy-integrating detectors, including reduced electronic and Swank noise, wider dynamic range, capability of spectral CT, and improved signal-to-noise ratio. Certain CT applications require high spatial resolution. In breast CT, for example, visualization of microcalcifications and assessment of tumor microvasculature after contrast enhancement require resolution on the order of 100  μm. A straightforward approach to increasing spatial resolution of pixellated CZT-based radiation detectors by merely decreasing the pixel size leads to two problems: (1) fabricating circuitry with small pixels becomes costly and (2) inter-pixel charge spreading can obviate any improvement in spatial resolution. We have used computer simulations to investigate position estimation algorithms that utilize charge sharing to achieve subpixel position resolution. To study these algorithms, we model a simple detector geometry with a 5×5 array of 200  μm pixels, and use a conditional probability function to model charge transport in CZT. We used COMSOL finite element method software to map the distribution of charge pulses and the Monte Carlo package PENELOPE for simulating fluorescent radiation. Performance of two x-ray interaction position estimation algorithms was evaluated: the method of maximum-likelihood estimation and a fast, practical algorithm that can be implemented in a readout application-specific integrated circuit and allows for identification of a quadrant of the pixel in which the interaction occurred. Both methods demonstrate good subpixel resolution; however, their actual efficiency is limited by the presence of fluorescent K-escape photons. Current experimental breast CT systems typically use detectors with a pixel size of 194  μm, with 2×2 binning during the acquisition giving an effective pixel size of 388  μm. Thus, it would be expected that the position estimate accuracy reported in this study would improve detection and visualization of microcalcifications as compared to that with conventional detectors.
PMCID: PMC4478882  PMID: 26158095
photon-counting detector; maximum-likelihood position estimation; subpixel resolution
7.  The tissue microarray data exchange specification: Extending TMA DES to provide flexible scoring and incorporate virtual slides 
Tissue MicroArrays (TMAs) are a high throughput technology for rapid analysis of protein expression across hundreds of patient samples. Often, data relating to TMAs is specific to the clinical trial or experiment it is being used for, and not interoperable. The Tissue Microarray Data Exchange Specification (TMA DES) is a set of eXtensible Markup Language (XML)-based protocols for storing and sharing digitized Tissue Microarray data. XML data are enclosed by named tags which serve as identifiers. These tag names can be Common Data Elements (CDEs), which have a predefined meaning or semantics. By using this specification in a laboratory setting with increasing demands for digital pathology integration, we found that the data structure lacked the ability to cope with digital slide imaging in respect to web-enabled digital pathology systems and advanced scoring techniques.
Materials and Methods:
By employing user centric design, and observing behavior in relation to TMA scoring and associated data, the TMA DES format was extended to accommodate the current limitations. This was done with specific focus on developing a generic tool for handling any given scoring system, and utilizing data for multiple observations and observers.
DTDs were created to validate the extensions of the TMA DES protocol, and a test set of data containing scores for 6,708 TMA core images was generated. The XML was then read into an image processing algorithm to utilize the digital pathology data extensions, and scoring results were easily stored alongside the existing multiple pathologist scores.
By extending the TMA DES format to include digital pathology data and customizable scoring systems for TMAs, the new system facilitates the collaboration between pathologists and organizations, and can be used in automatic or manual data analysis. This allows complying systems to effectively communicate complex and varied scoring data.
PMCID: PMC3073067  PMID: 21572508
CDEs; DTD; tissue microarray; TMA DES; virtual pathology; XML
8.  Machine learning approaches to analyze histological images of tissues from radical prostatectomies 
Computerized evaluation of histological preparations of prostate tissues involves identification of tissue components such as stroma (ST), benign/normal epithelium (BN) and prostate cancer (PCa). Image classification approaches have been developed to identify and classify glandular regions in digital images of prostate tissues; however their success has been limited by difficulties in cellular segmentation and tissue heterogeneity. We hypothesized that utilizing image pixels to generate intensity histograms of hematoxylin (H) and eosin (E) stains deconvoluted from H&E images numerically captures the architectural difference between glands and stroma. In addition, we postulated that joint histograms of local binary patterns and local variance (LBPxVAR) can be used as sensitive textural features to differentiate benign/normal tissue from cancer. Here we utilized a machine learning approach comprising of a support vector machine (SVM) followed by a random forest (RF) classifier to digitally stratify prostate tissue into ST, BN and PCa areas. Two pathologists manually annotated 210 images of low- and high-grade tumors from slides that were selected from 20 radical prostatectomies and digitized at high-resolution. The 210 images were split into the training (n = 19) and test (n = 191) sets. Local intensity histograms of H and E were used to train a SVM classifier to separate ST from epithelium (BN + PCa). The performance of SVM prediction was evaluated by measuring the accuracy of delineating epithelial areas. The Jaccard J = 59.5 ± 14.6 and Rand Ri = 62.0 ± 7.5 indices reported a significantly better prediction when compared to a reference method (Chen et al., Clinical Proteomics 2013, 10:18) based on the averaged values from the test set. To distinguish BN from PCa we trained a RF classifier with LBPxVAR and local intensity histograms and obtained separate performance values for BN and PCa: JBN = 35.2 ± 24.9, OBN = 49.6 ± 32, JPCa = 49.5 ± 18.5, OPCa = 72.7 ± 14.8 and Ri = 60.6 ± 7.6 in the test set. Our pixel-based classification does not rely on the detection of lumens, which is prone to errors and has limitations in high-grade cancers and has the potential to aid in clinical studies in which the quantification of tumor content is necessary to prognosticate the course of the disease. The image data set with ground truth annotation is available for public use to stimulate further research in this area.
PMCID: PMC5062020  PMID: 26362074
Machine learning; Image analysis; Prostate cancer; Tissue classification; Tissue quantification
9.  Development and Validation of a New Prognostic System for Patients with Hepatocellular Carcinoma 
PLoS Medicine  2016;13(4):e1002006.
Prognostic assessment in patients with hepatocellular carcinoma (HCC) remains controversial. Using the Italian Liver Cancer (ITA.LI.CA) database as a training set, we sought to develop and validate a new prognostic system for patients with HCC.
Methods and Findings
Prospective collected databases from Italy (training cohort, n = 3,628; internal validation cohort, n = 1,555) and Taiwan (external validation cohort, n = 2,651) were used to develop the ITA.LI.CA prognostic system. We first defined ITA.LI.CA stages (0, A, B1, B2, B3, C) using only tumor characteristics (largest tumor diameter, number of nodules, intra- and extrahepatic macroscopic vascular invasion, extrahepatic metastases). A parametric multivariable survival model was then used to calculate the relative prognostic value of ITA.LI.CA tumor stage, Eastern Cooperative Oncology Group (ECOG) performance status, Child–Pugh score (CPS), and alpha-fetoprotein (AFP) in predicting individual survival. Based on the model results, an ITA.LI.CA integrated prognostic score (from 0 to 13 points) was constructed, and its prognostic power compared with that of other integrated systems (BCLC, HKLC, MESIAH, CLIP, JIS). Median follow-up was 58 mo for Italian patients (interquartile range, 26–106 mo) and 39 mo for Taiwanese patients (interquartile range, 12–61 mo).
The ITA.LI.CA integrated prognostic score showed optimal discrimination and calibration abilities in Italian patients. Observed median survival in the training and internal validation sets was 57 and 61 mo, respectively, in quartile 1 (ITA.LI.CA score ≤ 1), 43 and 38 mo in quartile 2 (ITA.LI.CA score 2–3), 23 and 23 mo in quartile 3 (ITA.LI.CA score 4–5), and 9 and 8 mo in quartile 4 (ITA.LI.CA score > 5). Observed and predicted median survival in the training and internal validation sets largely coincided. Although observed and predicted survival estimations were significantly lower (log-rank test, p < 0.001) in Italian than in Taiwanese patients, the ITA.LI.CA score maintained very high discrimination and calibration features also in the external validation cohort.
The concordance index (C index) of the ITA.LI.CA score in the internal and external validation cohorts was 0.71 and 0.78, respectively. The ITA.LI.CA score’s prognostic ability was significantly better (p < 0.001) than that of BCLC stage (respective C indexes of 0.64 and 0.73), CLIP score (0.68 and 0.75), JIS stage (0.67 and 0.70), MESIAH score (0.69 and 0.77), and HKLC stage (0.68 and 0.75). The main limitations of this study are its retrospective nature and the intrinsically significant differences between the Taiwanese and Italian groups.
The ITA.LI.CA prognostic system includes both a tumor staging—stratifying patients with HCC into six main stages (0, A, B1, B2, B3, and C)—and a prognostic score—integrating ITA.LI.CA tumor staging, CPS, ECOG performance status, and AFP. The ITA.LI.CA prognostic system shows a strong ability to predict individual survival in European and Asian populations.
Using Italian and Taiwanese cohorts, Alessandro Vitale and colleagues develop and validate a staging system and prognostic model for hepatocellular carcinoma.
Editors' Summary
Primary liver cancer—a tumor that starts when a liver cell acquires genetic changes that allow it and its descendants to divide uncontrollably and move around the body (metastasize)—is the sixth most common cancer and the second leading cause of cancer-related deaths worldwide. Liver cancer kills more than three-quarters of a million people every year, mostly in resource-limited countries. The risk of developing hepatocellular carcinoma (HCC; the most common type of liver cancer) is highest in eastern and southeastern Asia; among wealthier nations, the risk of HCC is particularly high in Italy. HCC can be treated by surgical removal of part of the liver, liver transplantation, ablation (which uses an electric current to destroy the cancer cells), intra-arterial therapies (which deliver drugs directly into the liver), or systemic (whole body) drug therapies. However, the symptoms of HCC, which include weight loss, tiredness, and jaundice, are vague. HCC is therefore rarely diagnosed before the cancer is advanced and has a poor prognosis (likely outcome)—fewer than 5% of patients survive for five or more years after diagnosis.
Why Was This Study Done?
Cancer staging describes the severity of a cancer based on the size and extent of the original tumor and whether the tumor has metastasized. Staging helps doctors estimate the patient’s prognosis and can help them devise a treatment plan that will, hopefully, improve patients’ quality of life and may extend their life expectancy. Several staging systems have been devised for HCC, but prognostic assessment of patients with HCC is controversial. No single prognostic model (a model that allows clinicians to obtain predictions about the likely outcomes of individual patients) has been universally adopted. An ideal model is difficult to achieve as it would need to consider tumor-related, liver-function-related, and patient-related variables, all of which have different impacts on patient prognosis. Here, the researchers use a database created by the Italian Liver Cancer (ITA.LI.CA) group that includes information on more than 5,000 Italians with HCC to develop a new prognostic model to predict individual patient outcomes based on tumor-related, liver-function-related, and patient-related variables.
What Did the Researchers Do and Find?
The researchers first defined ITA.LI.CA stages for HCC using tumor characteristics only. They then used information on 3,628 patients in the ITA.LI.CA database (the “training” set) and statistical modeling to calculate the relative prognostic value of tumor staging, Eastern Cooperative Oncology Group (ECOG) performance status (an indicator of whether patients are able to look after themselves and undertake normal daily activities), liver function (measured using the Child—Pugh score), and alpha-fetoprotein level (a liver tumor marker) in the prediction of the survival of individual patients. Based on these modeling results, they constructed an ITA.LI.CA integrated prognostic score. The researchers report that the observed and predicted median (average) survival times in the training set and in an internal validation cohort of 1,555 additional patients in the ITA.LI.CA database were similar. Moreover, although the observed and predicted survival times were lower in the Italian patients than in 2,651 patients with HCC from Taiwan, the ITA.LI.CA score had high discrimination and calibration features in this external validation cohort as well (the discrimination of a prognostic model indicates its ability to separate patients into groups with different outcomes, the calibration of a prognostic model is the degree of correspondence between predicted and observed outcomes). Finally, the prognostic ability of the new ITA.LI.CA prognostic model was significantly better than that of several other prognostic scoring systems.
What Do These Findings Mean?
These findings introduce a revised staging system for HCC and an integrated prognostic score—the ITA.LI.CA prognostic score—based on this staging system, Child—Pugh score, ECOG performance status, and alpha-fetoprotein level that has a greater ability to predict survival among Italian and Taiwanese patients than previous prognostic models. Because this study was retrospective—previously recorded data, including outcomes, were used to develop the prognostic model—a prospective trial is needed to validate the ITA.LI.CA prognostic score. That is, researchers need to enroll a group of patients, determine their ITA.LI.CA prognostic scores, and then follow the patients to determine their actual outcomes. If validated in this way and in other populations, use of the ITA.LI.CA prognostic score should allow clinicians to provide more accurate prognoses for individual patients, and may be a starting point for evaluating which treatment option is best suited to each patient presenting with HCC.
Additional Information
This list of resources contains links that can be accessed when viewing the PDF on a device or via the online version of the article at
This study is further discussed in a PLOS Medicine Perspective by Neehar Parikh and Amit Singal
The US National Cancer Institute provides information about all aspects of cancer, including detailed information for patients and professionals about primary liver cancer and about cancer staging (in English and Spanish)
The American Cancer Society also provides information about liver cancer (including information on support programs and services; available in several languages)
The UK National Health Service Choices website provides information about primary liver cancer (including a video about coping with cancer) and about cancer staging
Cancer Research UK (a not-for-profit organization) provides detailed information about primary liver cancer
The British Liver Trust (a not-for-profit organization) also provides information about liver cancer, including a personal story
MedlinePlus provides links to further resources about liver cancer (in English and Spanish)
PMCID: PMC4846017  PMID: 27116206
10.  Computer-assisted assessment of the Human Epidermal Growth Factor Receptor 2 immunohistochemical assay in imaged histologic sections using a membrane isolation algorithm and quantitative analysis of positive controls 
BMC Medical Imaging  2008;8:11.
Breast cancers that overexpress the human epidermal growth factor receptor 2 (HER2) are eligible for effective biologically targeted therapies, such as trastuzumab. However, accurately determining HER2 overexpression, especially in immunohistochemically equivocal cases, remains a challenge. Manual analysis of HER2 expression is dependent on the assessment of membrane staining as well as comparisons with positive controls. In spite of the strides that have been made to standardize the assessment process, intra- and inter-observer discrepancies in scoring is not uncommon. In this manuscript we describe a pathologist assisted, computer-based continuous scoring approach for increasing the precision and reproducibility of assessing imaged breast tissue specimens.
Computer-assisted analysis on HER2 IHC is compared with manual scoring and fluorescence in situ hybridization results on a test set of 99 digitally imaged breast cancer cases enriched with equivocally scored (2+) cases. Image features are generated based on the staining profile of the positive control tissue and pixels delineated by a newly developed Membrane Isolation Algorithm. Evaluation of results was performed using Receiver Operator Characteristic (ROC) analysis.
A computer-aided diagnostic approach has been developed using a membrane isolation algorithm and quantitative use of positive immunostaining controls. By incorporating internal positive controls into feature analysis a greater Area Under the Curve (AUC) in ROC analysis was achieved than feature analysis without positive controls. Evaluation of HER2 immunostaining that utilized membrane pixels, controls, and percent area stained showed significantly greater AUC than manual scoring, and significantly less false positive rate when used to evaluate immunohistochemically equivocal cases.
It has been shown that by incorporating both a membrane isolation algorithm and analysis of known positive controls a computer-assisted diagnostic algorithm was developed that can reproducibly score HER2 status in IHC stained clinical breast cancer specimens. For equivocal scoring cases, this approach performed better than standard manual evaluation as assessed by ROC analysis in our test samples. Finally, there exists potential for utilizing image-analysis techniques for improving HER2 scoring at the immunohistochemically equivocal range.
PMCID: PMC2447833  PMID: 18534031
11.  Semi-supervised clinical text classification with Laplacian SVMs: an application to cancer case management 
Journal of biomedical informatics  2013;46(5):869-875.
To compare linear and Laplacian SVMs on a clinical text classification task; to evaluate the effect of unlabeled training data on Laplacian SVM performance.
The development of machine-learning based clinical text classifiers requires the creation of labeled training data, obtained via manual review by clinicians. Due to the effort and expense involved in labeling data, training data sets in the clinical domain are of limited size. In contrast, electronic medical record (EMR) systems contain hundreds of thousands of unlabeled notes that are not used by supervised machine learning approaches. Semi-supervised learning algorithms use both labeled and unlabeled data to train classifiers, and can outperform their supervised counterparts.
We trained support vector machines (SVMs) and Laplacian SVMs on a training reference standard of 820 abdominal CT, MRI, and Ultrasound reports labeled for the presence of potentially malignant liver lesions that require follow up (positive class prevalence 77%). The Laplacian SVM used 19,845 randomly sampled unlabeled notes in addition to the training reference standard. We evaluated SVMs and Laplacian SVMs on a test set of 520 labeled reports.
The Laplacian SVM trained on labeled and unlabeled radiology reports significantly outperformed supervised SVMs (Macro-F1 0.773 vs. 0.741, Sensitivity 0.943 vs. 0.911, Positive Predictive value 0.877 vs. 0.883). Performance improved with the number of labeled and unlabeled notes used to train the Laplacian SVM (pearson’s ρ=0.529 for correlation between number of unlabeled notes and macro-F1 score). These results suggest that practical semi-supervised methods such as the Laplacian SVM can leverage the large, unlabeled corpora that reside within EMRs to improve clinical text classification.
PMCID: PMC3806632  PMID: 23845911
Semi-supervised learning; Support vector machine; Graph Laplacian; Natural language processing
12.  A gamma-gaussian mixture model for detection of mitotic cells in breast cancer histopathology images 
In this paper, we propose a statistical approach for mitosis detection in breast cancer histological images. The proposed algorithm models the pixel intensities in mitotic and non-mitotic regions by a Gamma-Gaussian mixture model (GGMM) and employs a context aware post-processing (CAPP) in order to reduce false positives. Experimental results demonstrate the ability of this simple, yet effective method to detect mitotic cells (MCs) in standard H & E breast cancer histology images.
Counting of MCs in breast cancer histopathology images is one of three components (the other two being tubule formation, nuclear pleomorphism) required for developing computer assisted grading of breast cancer tissue slides. This is very challenging since the biological variability of the MCs makes their detection extremely difficult. In addition, if standard H & E is used (which stains chromatin rich structures, such as nucleus, apoptotic, and MCs dark blue) and it becomes extremely difficult to detect the latter given the fact that former two are densely localized in the tissue sections.
In this paper, a robust MCs detection technique is developed and tested on 35 breast histopathology images, belonging to five different tissue slides.
Settings and Design:
Our approach mimics a pathologists’ approach to MCs detections. The idea is (1) to isolate tumor areas from non-tumor areas (lymphoid/inflammatory/apoptotic cells), (2) search for MCs in the reduced space by statistically modeling the pixel intensities from mitotic and non-mitotic regions, and finally (3) evaluate the context of each potential MC in terms of its texture.
Materials and Methods:
Our experimental dataset consisted of 35 digitized images of breast cancer biopsy slides with paraffin embedded sections stained with H and E and scanned at × 40 using an Aperio scanscope slide scanner.
Statistical Analysis Used:
We propose GGMM for detecting MCs in breast histology images. Image intensities are modeled as random variables sampled from one of the two distributions; Gamma and Gaussian. Intensities from MCs are modeled by a gamma distribution and those from non-mitotic regions are modeled by a gaussian distribution. The choice of Gamma-Gaussian distribution is mainly due to the observation that the characteristics of the distribution match well with the data it models. The experimental results show that the proposed system achieves a high sensitivity of 0.82 with positive predictive value (PPV) of 0.29. Employing CAPP on these results produce 241% increase in PPV at the cost of less than 15% decrease in sensitivity.
In this paper, we presented a GGMM for detection of MCs in breast cancer histopathological images. In addition, we introduced CAPP as a tool to increase the PPV with a minimal loss in sensitivity. We evaluated the performance of the proposed detection algorithm in terms of sensitivity and PPV over a set of 35 breast histology images selected from five different tissue slides and showed that a reasonably high value of sensitivity can be retained while increasing the PPV. Our future work will aim at increasing the PPV further by modeling the spatial appearance of regions surrounding mitotic events.
PMCID: PMC3709430  PMID: 23858386
Breast cancer grading; histopathology image analysis; mitotic cell detection; statistical modeling of mitotic cells
13.  Accuracy of genome-wide imputation of untyped markers and impacts on statistical power for association studies 
BMC Genetics  2009;10:27.
Although high-throughput genotyping arrays have made whole-genome association studies (WGAS) feasible, only a small proportion of SNPs in the human genome are actually surveyed in such studies. In addition, various SNP arrays assay different sets of SNPs, which leads to challenges in comparing results and merging data for meta-analyses. Genome-wide imputation of untyped markers allows us to address these issues in a direct fashion.
384 Caucasian American liver donors were genotyped using Illumina 650Y (Ilmn650Y) arrays, from which we also derived genotypes from the Ilmn317K array. On these data, we compared two imputation methods: MACH and BEAGLE. We imputed 2.5 million HapMap Release22 SNPs, and conducted GWAS on ~40,000 liver mRNA expression traits (eQTL analysis). In addition, 200 Caucasian American and 200 African American subjects were genotyped using the Affymetrix 500 K array plus a custom 164 K fill-in chip. We then imputed the HapMap SNPs and quantified the accuracy by randomly masking observed SNPs.
MACH and BEAGLE perform similarly with respect to imputation accuracy. The Ilmn650Y results in excellent imputation performance, and it outperforms Affx500K or Ilmn317K sets. For Caucasian Americans, 90% of the HapMap SNPs were imputed at 98% accuracy. As expected, imputation of poorly tagged SNPs (untyped SNPs in weak LD with typed markers) was not as successful. It was more challenging to impute genotypes in the African American population, given (1) shorter LD blocks and (2) admixture with Caucasian populations in this population. To address issue (2), we pooled HapMap CEU and YRI data as an imputation reference set, which greatly improved overall performance. The approximate 40,000 phenotypes scored in these populations provide a path to determine empirically how the power to detect associations is affected by the imputation procedures. That is, at a fixed false discovery rate, the number of cis-eQTL discoveries detected by various methods can be interpreted as their relative statistical power in the GWAS. In this study, we find that imputation offer modest additional power (by 4%) on top of either Ilmn317K or Ilmn650Y, much less than the power gain from Ilmn317K to Ilmn650Y (13%).
Current algorithms can accurately impute genotypes for untyped markers, which enables researchers to pool data between studies conducted using different SNP sets. While genotyping itself results in a small error rate (e.g. 0.5%), imputing genotypes is surprisingly accurate. We found that dense marker sets (e.g. Ilmn650Y) outperform sparser ones (e.g. Ilmn317K) in terms of imputation yield and accuracy. We also noticed it was harder to impute genotypes for African American samples, partially due to population admixture, although using a pooled reference boosts performance. Interestingly, GWAS carried out using imputed genotypes only slightly increased power on top of assayed SNPs. The reason is likely due to adding more markers via imputation only results in modest gain in genetic coverage, but worsens the multiple testing penalties. Furthermore, cis-eQTL mapping using dense SNP set derived from imputation achieves great resolution, and locate associate peak closer to causal variants than conventional approach.
PMCID: PMC2709633  PMID: 19531258
14.  Automated segmentation of atherosclerotic histology based on pattern classification 
Histology sections provide accurate information on atherosclerotic plaque composition, and are used in various applications. To our knowledge, no automated systems for plaque component segmentation in histology sections currently exist.
Materials and Methods:
We perform pixel-wise classification of fibrous, lipid, and necrotic tissue in Elastica Von Gieson-stained histology sections, using features based on color channel intensity and local image texture and structure. We compare an approach where we train on independent data to an approach where we train on one or two sections per specimen in order to segment the remaining sections. We evaluate the results on segmentation accuracy in histology, and we use the obtained histology segmentations to train plaque component classification methods in ex vivo Magnetic resonance imaging (MRI) and in vivo MRI and computed tomography (CT).
In leave-one-specimen-out experiments on 176 histology slices of 13 plaques, a pixel-wise accuracy of 75.7 ± 6.8% was obtained. This increased to 77.6 ± 6.5% when two manually annotated slices of the specimen to be segmented were used for training. Rank correlations of relative component volumes with manually annotated volumes were high in this situation (P = 0.82-0.98). Using the obtained histology segmentations to train plaque component classification methods in ex vivo MRI and in vivo MRI and CT resulted in similar image segmentations for training on the automated histology segmentations as for training on a fully manual ground truth. The size of the lipid-rich necrotic core was significantly smaller when training on fully automated histology segmentations than when manually annotated histology sections were used. This difference was reduced and not statistically significant when one or two slices per section were manually annotated for histology segmentation.
Good histology segmentations can be obtained by automated segmentation, which show good correlations with ground truth volumes. In addition, these can be used to develop segmentation methods in other imaging modalities. Accuracy increases when one or two sections of the same specimen are used for training, which requires a limited amount of user interaction in practice.
PMCID: PMC3678743  PMID: 23766939
Histology; Segmentation; Classification; Atherosclerosis
15.  Reproducibility in the automated quantitative assessment of HER2/neu for breast cancer 
With the emerging role of digital imaging in pathology and the application of automated image-based algorithms to a number of quantitative tasks, there is a need to examine factors that may affect the reproducibility of results. These factors include the imaging properties of whole slide imaging (WSI) systems and their effect on the performance of quantitative tools. This manuscript examines inter-scanner and inter-algorithm variability in the assessment of the commonly used HER2/neu tissue-based biomarker for breast cancer with emphasis on the effect of algorithm training.
Materials and Methods:
A total of 241 regions of interest from 64 breast cancer tissue glass slides were scanned using three different whole-slide images and were analyzed using two different automated image analysis algorithms, one with preset parameters and another incorporating a procedure for objective parameter optimization. Ground truth from a panel of seven pathologists was available from a previous study. Agreement analysis was used to compare the resulting HER2/neu scores.
The results of our study showed that inter-scanner agreement in the assessment of HER2/neu for breast cancer in selected fields of view when analyzed with any of the two algorithms examined in this study was equal or better than the inter-observer agreement previously reported on the same set of data. Results also showed that discrepancies observed between algorithm results on data from different scanners were significantly reduced when the alternative algorithm that incorporated an objective re-training procedure was used, compared to the commercial algorithm with preset parameters.
Our study supports the use of objective procedures for algorithm training to account for differences in image properties between WSI systems.
PMCID: PMC3746414  PMID: 23967384
Quantitative immunohistochemistry; reproducibility; whole slide imaging
16.  Pancreatic Cancer Surgical Resection Margins: Molecular Assessment by Mass Spectrometry Imaging 
PLoS Medicine  2016;13(8):e1002108.
Surgical resection with microscopically negative margins remains the main curative option for pancreatic cancer; however, in practice intraoperative delineation of resection margins is challenging. Ambient mass spectrometry imaging has emerged as a powerful technique for chemical imaging and real-time diagnosis of tissue samples. We applied an approach combining desorption electrospray ionization mass spectrometry imaging (DESI-MSI) with the least absolute shrinkage and selection operator (Lasso) statistical method to diagnose pancreatic tissue sections and prospectively evaluate surgical resection margins from pancreatic cancer surgery.
Methods and Findings
Our methodology was developed and tested using 63 banked pancreatic cancer samples and 65 samples (tumor and specimen margins) collected prospectively during 32 pancreatectomies from February 27, 2013, to January 16, 2015. In total, mass spectra for 254,235 individual pixels were evaluated. When cross-validation was employed in the training set of samples, 98.1% agreement with histopathology was obtained. Using an independent set of samples, 98.6% agreement was achieved. We used a statistical approach to evaluate 177,727 mass spectra from samples with complex, mixed histology, achieving an agreement of 81%. The developed method showed agreement with frozen section evaluation of specimen margins in 24 of 32 surgical cases prospectively evaluated. In the remaining eight patients, margins were found to be positive by DESI-MSI/Lasso, but negative by frozen section analysis. The median overall survival after resection was only 10 mo for these eight patients as opposed to 26 mo for patients with negative margins by both techniques. This observation suggests that our method (as opposed to the standard method to date) was able to detect tumor involvement at the margin in patients who developed early recurrence. Nonetheless, a larger cohort of samples is needed to validate the findings described in this study. Careful evaluation of the long-term benefits to patients of the use of DESI-MSI for surgical margin evaluation is also needed to determine its value in clinical practice.
Our findings provide evidence that the molecular information obtained by DESI-MSI/Lasso from pancreatic tissue samples has the potential to transform the evaluation of surgical specimens. With further development, we believe the described methodology could be routinely used for intraoperative surgical margin assessment of pancreatic cancer.
Richard Zare and colleagues develop and test a method to evaluate surgical resection margins from pancreatic cancer surgery using mass spectrometric imaging.
Author Summary
Why Was This Study Done?
Ambient ionization mass spectrometry imaging can provide accurate diagnostic information differentiating cancerous from noncancerous tissue samples and has been recently shown to be particularly powerful in helping pathologists and surgeons determine whether cancer reaches the edge (margin) of the resection specimen in real time during surgery.
This study was performed to evaluate the feasibility and efficacy of this method during surgery for pancreatic cancer, one of the most lethal human cancers.
What Did the Researchers Do and Find?
Our methodology was developed and tested using 63 banked pancreatic cancer samples and 65 samples (tumor and specimen margins) collected prospectively during 32 pancreatectomies performed from 2013 to 2015.
We found that desorption electrospray ionization mass spectrometry imaging (DESI-MSI) allows discrimination of normal pancreatic and pancreatic cancer tissue based on diagnostic metabolic signatures and has the potential to assist in surgical decision making by informing the surgeon whether the entire tumor has been removed or not.
What Do These Findings Mean?
These findings provide novel information on molecular markers of pancreatic cancer and showcase the value of this methodology as an adjunct to the current pathologic method (frozen section analysis) for determining the completeness of cancer surgery.
The data reported in this study could be made available during actual surgery, and allow the surgeon to extend the boundaries of surgery to remove any residual tumor discovered by DESI-MSI at the margin.
PMCID: PMC5019340  PMID: 27575375
17.  Combining two open source tools for neural computation (BioPatRec and Netlab) improves movement classification for prosthetic control 
BMC Research Notes  2016;9(1):429.
Controlling a myoelectric prosthesis for upper limbs is increasingly challenging for the user as more electrodes and joints become available. Motion classification based on pattern recognition with a multi-electrode array allows multiple joints to be controlled simultaneously. Previous pattern recognition studies are difficult to compare, because individual research groups use their own data sets. To resolve this shortcoming and to facilitate comparisons, open access data sets were analysed using components of BioPatRec and Netlab pattern recognition models.
Performances of the artificial neural networks, linear models, and training program components were compared. Evaluation took place within the BioPatRec environment, a Matlab-based open source platform that provides feature extraction, processing and motion classification algorithms for prosthetic control. The algorithms were applied to myoelectric signals for individual and simultaneous classification of movements, with the aim of finding the best performing algorithm and network model. Evaluation criteria included classification accuracy and training time.
Results in both the linear and the artificial neural network models demonstrated that Netlab’s implementation using scaled conjugate training algorithm reached significantly higher accuracies than BioPatRec.
It is concluded that the best movement classification performance would be achieved through integrating Netlab training algorithms in the BioPatRec environment so that future prosthesis training can be shortened and control made more reliable. Netlab was therefore included into the newest release of BioPatRec (v4.0).
PMCID: PMC5007720  PMID: 27581624
Prosthetics; Upper limb amputation; Machine learning; Pattern recognition; Neural computation
18.  FASTQSim: platform-independent data characterization and in silico read generation for NGS datasets 
BMC Research Notes  2014;7:533.
High-throughput next generation sequencing technologies have enabled rapid characterization of clinical and environmental samples. Consequently, the largest bottleneck to actionable data has become sample processing and bioinformatics analysis, creating a need for accurate and rapid algorithms to process genetic data. Perfectly characterized in silico datasets are a useful tool for evaluating the performance of such algorithms. Background contaminating organisms are observed in sequenced mixtures of organisms. In silico samples provide exact truth. To create the best value for evaluating algorithms, in silico data should mimic actual sequencer data as closely as possible.
FASTQSim is a tool that provides the dual functionality of NGS dataset characterization and metagenomic data generation. FASTQSim is sequencing platform-independent, and computes distributions of read length, quality scores, indel rates, single point mutation rates, indel size, and similar statistics for any sequencing platform. To create training or testing datasets, FASTQSim has the ability to convert target sequences into in silico reads with specific error profiles obtained in the characterization step.
FASTQSim enables users to assess the quality of NGS datasets. The tool provides information about read length, read quality, repetitive and non-repetitive indel profiles, and single base pair substitutions. FASTQSim allows the user to simulate individual read datasets that can be used as standardized test scenarios for planning sequencing projects or for benchmarking metagenomic software. In this regard, in silico datasets generated with the FASTQsim tool hold several advantages over natural datasets: they are sequencing platform independent, extremely well characterized, and less expensive to generate. Such datasets are valuable in a number of applications, including the training of assemblers for multiple platforms, benchmarking bioinformatics algorithm performance, and creating challenge datasets for detecting genetic engineering toolmarks, etc.
Electronic supplementary material
The online version of this article (doi:10.1186/1756-0500-7-533) contains supplementary material, which is available to authorized users.
PMCID: PMC4246604  PMID: 25123167
Simulator; Algorithm; Next generation sequencing; FASTQ
19.  Using image analysis as a tool for assessment of prognostic and predictive biomarkers for breast cancer: How reliable is it? 
Estrogen receptor (ER), progesterone receptor (PR) and human epidermal growth factor receptor-2 (HER2) are important and well-established prognostic and predictive biomarkers for breast cancers and routinely tested on patient’s tumor samples by immunohistochemical (IHC) study. The accuracy of these test results has substantial impact on patient management. A critical factor that contributes to the result is the interpretation (scoring) of IHC. This study investigates how computerized image analysis can play a role in a reliable scoring, and identifies potential pitfalls with common methods.
Materials and Methods:
Whole slide images of 33 invasive ductal carcinoma (IDC) (10 ER and 23 HER2) were scored by pathologist under the light microscope and confirmed by another pathologist. The HER2 results were additionally confirmed by fluorescence in situ hybridization (FISH). The scoring criteria were adherent to the guidelines recommended by the American Society of Clinical Oncology/College of American Pathologists. Whole slide stains were then scored by commercially available image analysis algorithms from Definiens (Munich, Germany) and Aperio Technologies (Vista, CA, USA). Each algorithm was modified specifically for each marker and tissue. The results were compared with the semi-quantitative manual scoring, which was considered the gold standard in this study.
For HER2 positive group, each algorithm scored 23/23 cases within the range established by the pathologist. For ER, both algorithms scored 10/10 cases within range. The performance of each algorithm varies somewhat from the percentage of staining as compared to the pathologist’s reading.
Commercially available computerized image analysis can be useful in the evaluation of ER and HER2 IHC results. In order to achieve accurate results either manual pathologist region selection is necessary, or an automated region selection tool must be employed. Specificity can also be gained when strict quality assurance by a pathologist is invested. Quality assurance of image analysis by pathologists is always warranted. Automated image analysis should only be used as adjunct to pathologist’s evaluation.
PMCID: PMC3017682  PMID: 21221174
Biomarkers; breast cancer; image analysis
20.  Prostate cancer detection: Fusion of cytological and textural features 
A computer-assisted system for histological prostate cancer diagnosis can assist pathologists in two stages: (i) to locate cancer regions in a large digitized tissue biopsy, and (ii) to assign Gleason grades to the regions detected in stage 1. Most previous studies on this topic have primarily addressed the second stage by classifying the preselected tissue regions. In this paper, we address the first stage by presenting a cancer detection approach for the whole slide tissue image. We propose a novel method to extract a cytological feature, namely the presence of cancer nuclei (nuclei with prominent nucleoli) in the tissue, and apply this feature to detect the cancer regions. Additionally, conventional image texture features which have been widely used in the literature are also considered. The performance comparison among the proposed cytological textural feature combination method, the texture-based method and the cytological feature-based method demonstrates the robustness of the extracted cytological feature. At a false positive rate of 6%, the proposed method is able to achieve a sensitivity of 78% on a dataset including six training images (each of which has approximately 4,000×7,000 pixels) and 1 1 whole-slide test images (each of which has approximately 5,000×23,000 pixels). All images are at 20X magnification.
PMCID: PMC3312709  PMID: 22811959
Prostate cancer; cytology; texture; histology; nuclei; nucleoli; whole slide image
21.  Mitosis detection in breast cancer pathology images by combining handcrafted and convolutional neural network features 
Journal of Medical Imaging  2014;1(3):034003.
Breast cancer (BCa) grading plays an important role in predicting disease aggressiveness and patient outcome. A key component of BCa grade is the mitotic count, which involves quantifying the number of cells in the process of dividing (i.e., undergoing mitosis) at a specific point in time. Currently, mitosis counting is done manually by a pathologist looking at multiple high power fields (HPFs) on a glass slide under a microscope, an extremely laborious and time consuming process. The development of computerized systems for automated detection of mitotic nuclei, while highly desirable, is confounded by the highly variable shape and appearance of mitoses. Existing methods use either handcrafted features that capture certain morphological, statistical, or textural attributes of mitoses or features learned with convolutional neural networks (CNN). Although handcrafted features are inspired by the domain and the particular application, the data-driven CNN models tend to be domain agnostic and attempt to learn additional feature bases that cannot be represented through any of the handcrafted features. On the other hand, CNN is computationally more complex and needs a large number of labeled training instances. Since handcrafted features attempt to model domain pertinent attributes and CNN approaches are largely supervised feature generation methods, there is an appeal in attempting to combine these two distinct classes of feature generation strategies to create an integrated set of attributes that can potentially outperform either class of feature extraction strategies individually. We present a cascaded approach for mitosis detection that intelligently combines a CNN model and handcrafted features (morphology, color, and texture features). By employing a light CNN model, the proposed approach is far less demanding computationally, and the cascaded strategy of combining handcrafted features and CNN-derived features enables the possibility of maximizing the performance by leveraging the disconnected feature sets. Evaluation on the public ICPR12 mitosis dataset that has 226 mitoses annotated on 35 HPFs (400× magnification) by several pathologists and 15 testing HPFs yielded an F-measure of 0.7345. Our approach is accurate, fast, and requires fewer computing resources compared to existent methods, making this feasible for clinical use.
PMCID: PMC4479031  PMID: 26158062
mitosis; breast cancer; convolutional neural networks; cascaded ensemble; handcrafted feature; digital pathology
22.  Learning histopathological patterns 
The aim was to demonstrate a method for automated image analysis of immunohistochemically stained tissue samples for extracting features that correlate with patient disease. We address the problem of quantifying tumor tissue and segmenting and counting cell nuclei.
Materials and Methods:
Our method utilizes a flexible segmentation method based on sparse coding trained from representative image samples. Nuclei counting is based on a nucleus model that takes size, shape, and nucleus probability into account. Nuclei clustering and overlays are resolved using a gray-weighted distance transform. We obtain a probability measure for pixels belonging to a nucleus from our segmentation procedure. Experiments are carried out on two sets of immunohistochemically stained images – one set based on the estrogen receptor (ER) and the other on antigen KI-67. For the nuclei separation we have selected 207 ER image samples from 58 tissue micro array-cores corresponding to 58 patients and 136 KI-67 image samples also from 58 cores. The images are hand-annotated by marking the center position of each nucleus. For the ER data we have a total of 1006 nuclei and for the KI-67 we have 796 nuclei. Segmentation performance was evaluated in terms of missing nuclei, falsely detected nuclei, and multiple detections. The proposed method is compared to state-of-the-art Bayesian classification.
Statistical analysis used:
The performance of the proposed method and a state-of-the-art algorithm including variations thereof is compared using the Wilcoxon rank sum test.
For both the ER experiment and the KI-67 experiment the proposed method exhibits lower error rates than the state-of-the-art method. Total error rates were 4.8 % and 7.7 % in the two experiments, corresponding to an average of 0.23 and 0.45 errors per image, respectively. The Wilcoxon rank sum tests show statistically significant improvements over the state-of-the-art method.
We have demonstrated a method and obtained good performance compared to state-of-the-art nuclei separation. The segmentation procedure is simple, highly flexible, and we demonstrate how it, in addition to the nuclei separation, can perform precise segmentation of cancerous tissue. The complexity of the segmentation procedure is linear in the image size and the nuclei separation is linear in the number of nuclei. Additionally the method can be parallelized to obtain high-speed computations.
PMCID: PMC3312718  PMID: 22811956
Computer-aided classification; digital histopathology images; flexible learning based segmentation; image segmentation
23.  Functional Inference of Complex Anatomical Tendinous Networks at a Macroscopic Scale via Sparse Experimentation 
PLoS Computational Biology  2012;8(11):e1002751.
In systems and computational biology, much effort is devoted to functional identification of systems and networks at the molecular-or cellular scale. However, similarly important networks exist at anatomical scales such as the tendon network of human fingers: the complex array of collagen fibers that transmits and distributes muscle forces to finger joints. This network is critical to the versatility of the human hand, and its function has been debated since at least the 16th century. Here, we experimentally infer the structure (both topology and parameter values) of this network through sparse interrogation with force inputs. A population of models representing this structure co-evolves in simulation with a population of informative future force inputs via the predator-prey estimation-exploration algorithm. Model fitness depends on their ability to explain experimental data, while the fitness of future force inputs depends on causing maximal functional discrepancy among current models. We validate our approach by inferring two known synthetic Latex networks, and one anatomical tendon network harvested from a cadaver's middle finger. We find that functionally similar but structurally diverse models can exist within a narrow range of the training set and cross-validation errors. For the Latex networks, models with low training set error [<4%] and resembling the known network have the smallest cross-validation errors [∼5%]. The low training set [<4%] and cross validation [<7.2%] errors for models for the cadaveric specimen demonstrate what, to our knowledge, is the first experimental inference of the functional structure of complex anatomical networks. This work expands current bioinformatics inference approaches by demonstrating that sparse, yet informative interrogation of biological specimens holds significant computational advantages in accurate and efficient inference over random testing, or assuming model topology and only inferring parameters values. These findings also hold clues to both our evolutionary history and the development of versatile machines.
Author Summary
In science and medicine alike, one of the critical steps to understand the working of organisms is to identify how a given individual is similar or different from others. Only then can the specific features of an individual be distinguished from the general properties of that species. However, doing enough input-output experiments on a given organism to obtain a reliable description of its function (i.e., a model) can often harm the organism, or require too much time when testing perishable tissues or human subjects. We have met this challenge by demonstrating that our novel algorithm can accelerate the extraction of accurate functional models in complex tissues by continually tailoring each successive experiment to be more informative. We apply this new method to the problem of describing how the tendons of the fingers interact, which has puzzled scientists and clinicians since the time of Da Vinci. This new computational-experimental method now enables fresh research directions in biological and medical research by allowing the experimental extraction of accurate functional models with minimal damage to the organism. For example, it will allow a better understanding of similarities and differences among related species, and the development of personalized medical treatment.
PMCID: PMC3493461  PMID: 23144601
24.  Ranked retrieval of segmented nuclei for objective assessment of cancer gene repositioning 
BMC Bioinformatics  2012;13:232.
Correct segmentation is critical to many applications within automated microscopy image analysis. Despite the availability of advanced segmentation algorithms, variations in cell morphology, sample preparation, and acquisition settings often lead to segmentation errors. This manuscript introduces a ranked-retrieval approach using logistic regression to automate selection of accurately segmented nuclei from a set of candidate segmentations. The methodology is validated on an application of spatial gene repositioning in breast cancer cell nuclei. Gene repositioning is analyzed in patient tissue sections by labeling sequences with fluorescence in situ hybridization (FISH), followed by measurement of the relative position of each gene from the nuclear center to the nuclear periphery. This technique requires hundreds of well-segmented nuclei per sample to achieve statistical significance. Although the tissue samples in this study contain a surplus of available nuclei, automatic identification of the well-segmented subset remains a challenging task.
Logistic regression was applied to features extracted from candidate segmented nuclei, including nuclear shape, texture, context, and gene copy number, in order to rank objects according to the likelihood of being an accurately segmented nucleus. The method was demonstrated on a tissue microarray dataset of 43 breast cancer patients, comprising approximately 40,000 imaged nuclei in which the HES5 and FRA2 genes were labeled with FISH probes. Three trained reviewers independently classified nuclei into three classes of segmentation accuracy. In man vs. machine studies, the automated method outperformed the inter-observer agreement between reviewers, as measured by area under the receiver operating characteristic (ROC) curve. Robustness of gene position measurements to boundary inaccuracies was demonstrated by comparing 1086 manually and automatically segmented nuclei. Pearson correlation coefficients between the gene position measurements were above 0.9 (p < 0.05). A preliminary experiment was conducted to validate the ranked retrieval in a test to detect cancer. Independent manual measurement of gene positions agreed with automatic results in 21 out of 26 statistical comparisons against a pooled normal (benign) gene position distribution.
Accurate segmentation is necessary to automate quantitative image analysis for applications such as gene repositioning. However, due to heterogeneity within images and across different applications, no segmentation algorithm provides a satisfactory solution. Automated assessment of segmentations by ranked retrieval is capable of reducing or even eliminating the need to select segmented objects by hand and represents a significant improvement over binary classification. The method can be extended to other high-throughput applications requiring accurate detection of cells or nuclei across a range of biomedical applications.
PMCID: PMC3484015  PMID: 22971117
25.  Antibody-supervised deep learning for quantification of tumor-infiltrating immune cells in hematoxylin and eosin stained breast cancer samples 
Immune cell infiltration in tumor is an emerging prognostic biomarker in breast cancer. The gold standard for quantification of immune cells in tissue sections is visual assessment through a microscope, which is subjective and semi-quantitative. In this study, we propose and evaluate an approach based on antibody-guided annotation and deep learning to quantify immune cell-rich areas in hematoxylin and eosin (H&E) stained samples.
Consecutive sections of formalin-fixed parafin-embedded samples obtained from the primary tumor of twenty breast cancer patients were cut and stained with H&E and the pan-leukocyte CD45 antibody. The stained slides were digitally scanned, and a training set of immune cell-rich and cell-poor tissue regions was annotated in H&E whole-slide images using the CD45-expression as a guide. In analysis, the images were divided into small homogenous regions, superpixels, from which features were extracted using a pretrained convolutional neural network (CNN) and classified with a support of vector machine. The CNN approach was compared to texture-based classification and to visual assessments performed by two pathologists.
In a set of 123,442 labeled superpixels, the CNN approach achieved an F-score of 0.94 (range: 0.92–0.94) in discrimination of immune cell-rich and cell-poor regions, as compared to an F-score of 0.88 (range: 0.87–0.89) obtained with the texture-based classification. When compared to visual assessment of 200 images, an agreement of 90% (κ = 0.79) to quantify immune infiltration with the CNN approach was achieved while the inter-observer agreement between pathologists was 90% (κ = 0.78).
Our findings indicate that deep learning can be applied to quantify immune cell infiltration in breast cancer samples using a basic morphology staining only. A good discrimination of immune cell-rich areas was achieved, well in concordance with both leukocyte antigen expression and pathologists’ visual assessment.
PMCID: PMC5027738  PMID: 27688929
breast cancer; convolutional neural network; digital pathology; tumor-infiltrating immune cells; tumor microenvironment

Results 1-25 (1442188)