High-resolution imaging techniques were used to analyze the relationship between a Wnt-responsive cancer stem cell (CSC)-enriched population and the tumor vasculature using p53-null mouse mammary tumors transduced with a Wnt signaling pathway reporter. The results demonstrate that the combined strategy of monitoring the fluorescently labeled CSCs and vasculature using high-resolution imaging techniques provides a unique opportunity to study the CSC microenvironment.
Cancer stem cells (CSCs, or tumor-initiating cells) may be responsible for tumor formation in many types of cancer, including breast cancer. Using high-resolution imaging techniques, we analyzed the relationship between a Wnt-responsive, CSC-enriched population and the tumor vasculature using p53-null mouse mammary tumors transduced with a lentiviral Wnt signaling reporter. Consistent with their localization in the normal mammary gland, Wnt-responsive cells in tumors were enriched in the basal/myoepithelial population and generally located in close proximity to blood vessels. The Wnt-responsive CSCs did not colocalize with the hypoxia-inducible factor 1α-positive cells in these p53-null basal-like tumors. Average vessel diameter and vessel tortuosity were increased in p53-null mouse tumors, as well as in a human tumor xenograft as compared with the normal mammary gland. The combined strategy of monitoring the fluorescently labeled CSCs and vasculature using high-resolution imaging techniques provides a unique opportunity to study the CSC and its surrounding vasculature.
Cancer stem cells; Stem cell microenvironment; In vivo optical imaging; Microvasculature; Signal transduction; p53
This study developed a highly efficient serum-free pluripotent stem cell (PSC) neural induction medium that can induce human PSCs into primitive neural stem cells (NSCs) in 7 days, obviating the need for time-consuming, laborious embryoid body generation or rosette picking. This method of primitive NSC derivation sets the stage for the scalable production of clinically relevant neural cells for cell therapy applications in good manufacturing practice conditions.
Human pluripotent stem cells (hPSCs), including human embryonic stem cells and human induced pluripotent stem cells, are unique cell sources for disease modeling, drug discovery screens, and cell therapy applications. The first step in producing neural lineages from hPSCs is the generation of neural stem cells (NSCs). Current methods of NSC derivation involve the time-consuming, labor-intensive steps of an embryoid body generation or coculture with stromal cell lines that result in low-efficiency derivation of NSCs. In this study, we report a highly efficient serum-free pluripotent stem cell neural induction medium that can induce hPSCs into primitive NSCs (pNSCs) in 7 days, obviating the need for time-consuming, laborious embryoid body generation or rosette picking. The pNSCs expressed the neural stem cell markers Pax6, Sox1, Sox2, and Nestin; were negative for Oct4; could be expanded for multiple passages; and could be differentiated into neurons, astrocytes, and oligodendrocytes, in addition to the brain region-specific neuronal subtypes GABAergic, dopaminergic, and motor neurons. Global gene expression of the transcripts of pNSCs was comparable to that of rosette-derived and human fetal-derived NSCs. This work demonstrates an efficient method to generate expandable pNSCs, which can be further differentiated into central nervous system neurons and glia with temporal, spatial, and positional cues of brain regional heterogeneity. This method of pNSC derivation sets the stage for the scalable production of clinically relevant neural cells for cell therapy applications in good manufacturing practice conditions.
Astrocytes; Cell culture; Neural stem cell; Neural induction; Neural differentiation; Oligodendrocytes; Neuron; Nestin
The vasculature inside breast cancers is one important component of the tumour microenvironment. The investigation of its spatial morphology, distribution and interactions with cancer cells, including cancer stem cells, is essential for elucidating mechanisms of tumour development and treatment response. Using confocal microscopy and fluorescent markers, we have acquired three-dimensional images of vasculature within mammary tumours and normal mammary gland of mouse models. However, it is difficult to segment and reconstruct complex vasculature accurately from the in vivo three-dimensional images owing to the existence of uneven intensity and regions with low signal-to-noise ratios (SNR). To overcome these challenges, we have developed a novel three-dimensional vasculature segmentation method based on local clustering and classification. First, images of vasculature are clustered into local regions, whose boundaries well delineate vasculature even in low SNR and uneven intensity regions. Then local regions belonging to vasculature are identified by applying a semi-supervised classification method based on three informative features of the local regions. Comparison of results using simulated and real vasculature images, from mouse mammary tumours and normal mammary gland, shows that the new method outperforms existing methods, and can be used for three-dimensional images with uneven background and low SNR to achieve accurate vasculature reconstruction.
vasculature reconstruction, tumour microenvironment, vascular imaging
Motivation: Currently there are no curative anticancer drugs, and drug resistance is often acquired after drug treatment. One of the reasons is that cancers are complex diseases, regulated by multiple signaling pathways and cross talks among the pathways. It is expected that drug combinations can reduce drug resistance and improve patients’ outcomes. In clinical practice, the ideal and feasible drug combinations are combinations of existing Food and Drug Administration-approved drugs or bioactive compounds that are already used on patients or have entered clinical trials and passed safety tests. These drug combinations could directly be used on patients with less concern of toxic effects. However, there is so far no effective computational approach to search effective drug combinations from the enormous number of possibilities.
Results: In this study, we propose a novel systematic computational tool DrugComboRanker to prioritize synergistic drug combinations and uncover their mechanisms of action. We first build a drug functional network based on their genomic profiles, and partition the network into numerous drug network communities by using a Bayesian non-negative matrix factorization approach. As drugs within overlapping community share common mechanisms of action, we next uncover potential targets of drugs by applying a recommendation system on drug communities. We meanwhile build disease-specific signaling networks based on patients’ genomic profiles and interactome data. We then identify drug combinations by searching drugs whose targets are enriched in the complementary signaling modules of the disease signaling network. The novel method was evaluated on lung adenocarcinoma and endocrine receptor positive breast cancer, and compared with other drug combination approaches. These case studies discovered a set of effective drug combinations top ranked in our prediction list, and mapped the drug targets on the disease signaling network to highlight the mechanisms of action of the drug combinations.
Availability and implementation: The program is available on request.
A new type of signaling network element, called cancer signaling bridges (CSB), has been shown to have the potential for systematic and fast-tracked drug repositioning. On the basis of CSBs, we developed a computational model to derive specific downstream signaling pathways that reveal previously unknown target–disease connections and new mechanisms for specific cancer subtypes. The model enables us to reposition drugs based on available patient gene expression data. We applied this model to repurpose known or shelved drugs for brain, lung, and bone metastases of breast cancer with the hypothesis that cancer subtypes have their own specific signaling mechanisms. To test the hypothesis, we addressed specific CSBs for each metastasis that satisfy (i) CSB proteins are activated by the maximal number of enriched signaling pathways specific to a given metastasis, and (ii) CSB proteins are involved in the most differential expressed coding genes specific to each breast cancer metastasis. The identified signaling networks for the three types of breast cancer metastases contain 31, 15, and 18 proteins and are used to reposition 15, 9, and 2 drug candidates for the brain, lung, and bone metastases. We conducted both in vitro and in vivo preclinical experiments as well as analysis on patient tumor specimens to evaluate the targets and repositioned drugs. Of special note, we found that the Food and Drug Administration-approved drugs, sunitinib and dasatinib, prohibit brain metastases derived from breast cancer, addressing one particularly challenging aspect of this disease.
The way in which cells adopt different morphologies is not fully understood. Cell shape could be a continuous variable or restricted to a set of discrete forms. We developed quantitative methods to describe cell shape and show that Drosophila hemocytes in culture are a heterogeneous mixture of five discrete morphologies. In an RNAi screen of genes affecting the morphological complexity of heterogeneous populations, we found that most genes regulate the transition between discrete shapes rather than generating new morphologies. In particular, we identified a subset of genes, including the tumour suppressor PTEN, that decrease the heterogeneity of the population leading to populations enriched in rounded or elongated forms. We show that these genes have a highly conserved function as regulators of cell shape in both mouse and human metastatic melanoma cells.
Recent reports indicate that a subgroup of tumor cells named cancer stem cells (CSCs) or tumor initiating cells (TICs) are responsible for tumor initiation, growth and drug resistance. This subgroup of tumor cells has self-renewal capacity and could differentiate into heterogeneous tumor cell populations through asymmetric proliferation. The idea of CSC provides informative insights into tumor initiation, metastasis and treatment. However, the underlying mechanisms of CSCs regulating tumor behaviors are unclear due to the complex cancer system. To study the functions of CSCs in the complex tumor system, a few mathematical modeling studies have been proposed. Whereas, the effect of microenvironment (mE) factors, the behaviors of CSCs, progenitor tumor cells (PCs) and differentiated tumor cells (TCs), and the impact of CSC fraction and signaling heterogeneity, are not adequately explored yet.
In this study, a novel 3D multi-scale mathematical modeling is proposed to investigate the behaviors of CSCsin tumor progressions. The model integrates CSCs, PCs, and TCs together with a few essential mE factors. With this model, we simulated and investigated the tumor development and drug response under different CSC content and heterogeneity.
The simulation results shown that the fraction of CSCs plays a critical role in driving the tumor progression and drug resistance. It is also showed that the pure chemo-drug treatment was not a successful treatment, as it resulted in a significant increase of the CSC fraction. It further shown that the self-renew heterogeneity of the initial CSC population is a cause of the heterogeneity of the derived tumors in terms of the CSC fraction and response to drug treatments.
The proposed 3D multi-scale model provides a new tool for investigating the behaviors of CSC in CSC-initiated tumors, which enables scientists to investigate and generate testable hypotheses about CSCs in tumor development and drug response under different microenvironments and drug perturbations.
Recent advances in automated high-resolution fluorescence microscopy and robotic handling have made the systematic and cost effective study of diverse morphological changes within a large population of cells possible under a variety of perturbations, e.g., drugs, compounds, metal catalysts, RNA interference (RNAi). Cell population-based studies deviate from conventional microscopy studies on a few cells, and could provide stronger statistical power for drawing experimental observations and conclusions. However, it is challenging to manually extract and quantify phenotypic changes from the large amounts of complex image data generated. Thus, bioimage informatics approaches are needed to rapidly and objectively quantify and analyze the image data. This paper provides an overview of the bioimage informatics challenges and approaches in image-based studies for drug and target discovery. The concepts and capabilities of image-based screening are first illustrated by a few practical examples investigating different kinds of phenotypic changes caEditorsused by drugs, compounds, or RNAi. The bioimage analysis approaches, including object detection, segmentation, and tracking, are then described. Subsequently, the quantitative features, phenotype identification, and multidimensional profile analysis for profiling the effects of drugs and targets are summarized. Moreover, a number of publicly available software packages for bioimage informatics are listed for further reference. It is expected that this review will help readers, including those without bioimage informatics expertise, understand the capabilities, approaches, and tools of bioimage informatics and apply them to advance their own studies.
Glioblastoma Multiforme (GBM) cells are highly invasive, infiltrating into the surrounding normal brain tissue, making it impossible to completely eradicate GBM tumors by surgery or radiation. Increasing evidence also shows that these migratory cells are highly resistant to cytotoxic reagents, but decreasing their migratory capability can re-sensitize them to chemotherapy. These evidences suggest that the migratory cell population may serve as a better therapeutic target for more effective treatment of GBM. In order to understand the regulatory mechanism underlying the motile phenotype, we carried out a genome-wide RNAi screen for genes inhibiting the migration of GBM cells. The screening identified a total of twenty-five primary hits; seven of them were confirmed by secondary screening. Further study showed that three of the genes, FLNA, KHSRP and HCFC1, also functioned in vivo, and knocking them down caused multifocal tumor in a mouse model. Interestingly, two genes, KHSRP and HCFC1, were also found to be correlated with the clinical outcome of GBM patients. These two genes have not been previously associated with cell migration.
High content neuron image processing is considered as an important method for quantitative neurobiological studies. The main goal of analysis in this paper is to provide automatic image processing approaches to process neuron images for studying neuron mechanism in high content screening. In the nuclei channel, all nuclei are segmented and detected by applying the gradient vector field based watershed. Then the neuronal nuclei are selected based on the soma region detected in neurite channel. In neurite images, we propose a novel neurite centerline extraction approach using the improved line-pixel detection technique. The proposed neurite tracing method can detect the curvilinear structure more accurately compared with the current existing methods. An interface called NeuriteIQ based on the proposed algorithms is developed finally for better application in high content screening.
High content screening; Microscopy image; Nuclei segmentation; Neurite outgrowth; Line-pixel detection; Branch area
Cancer cells with active drug-efflux capability are multidrug resistant and pose a significant obstacle for the efficacy of chemotherapy. Moreover, recent evidence suggests that high drug-efflux cancer cells (HDECCs) may be selectively enriched with stem-like cancer cells, which are believed to be the cause for tumor initiation and recurrence. There is a great need for therapeutic reagents that are capable of eliminating HDECCs. We developed an image-based high-content screening (HCS) system to specifically identify and analyze the HDECC population in lung cancer cells. Using the system, we screened 1,280 pharmacologically active compounds which identified twelve potent HDECC inhibitors. It is shown that these inhibitors are able to overcome MDR and sensitize HDECCs to chemotherapeutic drugs, or directly reduce the tumorigenicity of lung cancer cells possibly by affecting stem-like cancer cells. The HCS system we established provides a new approach for identifying therapeutic reagents overcoming MDR. The compounds identified by the screening may potentially be used as potential adjuvant to improve the efficacy of chemotherapeutic drugs.
high drug-efflux cancer cells; multidrug resistance; high content screening; image-based assay
Automated time-lapse microscopy can visualize proliferation of large numbers of individual cells, enabling accurate measurement of the frequency of cell division and the duration of interphase and mitosis. However, extraction of quantitative information by manual inspection of time-lapse movies is too time-consuming to be useful for analysis of large experiments.
Here we present an automated time-series approach that can measure changes in the duration of mitosis and interphase in individual cells expressing fluorescent histone 2B. The approach requires analysis of only 2 features, nuclear area and average intensity. Compared to supervised learning approaches, this method reduces processing time and does not require generation of training data sets. We demonstrate that this method is as sensitive as manual analysis in identifying small changes in interphase or mitotic duration induced by drug or siRNA treatment.
This approach should facilitate automated analysis of high-throughput time-lapse data sets to identify small molecules or gene products that influence timing of cell division.
We present a label-free, chemically-selective, quantitative imaging strategy to identify breast cancer and differentiate its subtypes using coherent anti-Stokes Raman scattering (CARS) microscopy. Human normal breast tissue, benign proliferative, as well as in situ and invasive carcinomas, were imaged ex vivo. Simply by visualizing cellular and tissue features appearing on CARS images, cancerous lesions can be readily separated from normal tissue and benign proliferative lesion. To further distinguish cancer subtypes, quantitative disease-related features, describing the geometry and distribution of cancer cell nuclei, were extracted and applied to a computerized classification system. The results show that in situ carcinoma was successfully distinguished from invasive carcinoma, while invasive ductal carcinoma (IDC) and invasive lobular carcinoma were also distinguished from each other. Furthermore, 80% of intermediate-grade IDC and 85% of high-grade IDC were correctly distinguished from each other. The proposed quantitative CARS imaging method has the potential to enable rapid diagnosis of breast cancer.
(170.3880) Medical and biological imaging; (170.4580) Optical diagnostics for medicine; (180.4315) Nonlinear microscopy; (300.6230) Spectroscopy, coherent anti-Stokes Raman scattering microscopy
A custom built coherent anti-Stokes Raman scattering (CARS) microscope was used to image prostatic glands and nerve structures from 17 patients undergoing radical prostatectomy. Imaging of glandular and nerve structures showed distinctive cellular features that correlated to histological stains. Segmentation of cell nucleus was performed to establish a cell feature-based model to separate normal glands from cancer glands. In this study, we use a single parameter, average cell neighbor distance based on CARS imaging, to characterize normal and cancerous glandular structures. By combining CARS with our novel classification model, we are able to characterize prostate glandular and nerve structures in a manner that potentially enables real-time, intra-operative assessment of surgical margins and neurovascular bundles. As such, this method could potentially improve outcomes following radical prostatectomy.
(180.4315) Nonlinear microscopy; (170.1610) Clinical applications; (170.3880) Medical and biological imaging; (170.4580) Optical diagnostics for medicine
Phase-contrast microscopy is a common approach for studying the dynamics of cell behaviors, such as cell migration. Cell segmentation is the basis of quantitative analysis of the immense cellular images. However, the complicated cell morphological appearance in phase-contrast microscopy images challenges the existing segmentation methods. This paper proposes a new cell segmentation method for cancer cell migration studies using phase-contrast images. Instead of segmenting cells directly based on commonly used low-level features, e.g. intensity and gradient, we first identify the leading protrusions, a high level feature, of cancer cells. Based on the identified cell leading protrusions, we introduce a front vector flow guided active contour, which guides the initial cell boundaries to the real boundaries. The experimental validation on a set of breast cancer cell images shows that the proposed method demonstrates fast, stable, and accurate segmentation for breast cancer cells with wide range of sizes and shapes.
Studies of differentiation abilities of stem cells have been attracting a lot of attention over the last years. Microscopy can be used to record details of the differentiation process of stem cells under different perturbations and is an important tool for studying stem cell differentiation. Since it is infeasible to quantitatively analyze a huge amount of image data manually, automated image analysis systems are urgently needed. However, the complicated morphological appearances of stem cells are challenging to the existing segmentation methods. Herein, we propose a new, automated scheme for stem cell segmentation. This scheme first uses the multi-scale blob and curvilinear structure detectors to delineate the skeletons of stem cells quickly and then segment out stem cells by refining the skeletons to the cell boundaries using multi-level sets. The initial experimental results indicate the effectiveness of the proposed scheme.
stem cell differentiation; blob detection; curvilinear structure detection; cell segmentation; level set
To evaluate the association between the clinical, dosimetric factors and severe acute radiation pneumonitis (SARP) in patients with locally advanced non-small cell lung cancer (LANSCLC) treated with concurrent chemotherapy and intensity-modulated radiotherapy (IMRT).
We analyzed 94 LANSCLC patients treated with concurrent chemotherapy and IMRT between May 2005 and September 2006. SARP was defined as greater than or equal 3 side effects and graded according to Common Terminology Criteria for Adverse Events (CTCAE) version 3.0.
The clinical and dosimetric factors were analyzed. Univariate and multivariate logistic regression analyses were performed to evaluate the relationship between clinical, dosimetric factors and SARP.
Median follow-up was 10.5 months (range 6.5-24). Of 94 patients, 11 (11.7%) developed SARP. Univariate analyses showed that the normal tissue complication probability (NTCP), mean lung dose (MLD), relative volumes of lung receiving more than a threshold dose of 5-60 Gy at increments of 5 Gy (V5-V60), chronic obstructive pulmonary disease (COPD) and Forced Expiratory Volume in the first second (FEV1) were associated with SARP (p < 0.05). In multivariate analysis, NTCP value (p = 0.001) and V10 (p = 0.015) were the most significant factors associated with SARP. The incidences of SARP in the group with NTCP > 4.2% and NTCP ≤4.2% were 43.5% and 1.4%, respectively (p < 0.01). The incidences of SARP in the group with V10 ≤50% and V10 >50% were 5.7% and 29.2%, respectively (p < 0.01).
NTCP value and V10 are the useful indicators for predicting SARP in NSCLC patients treated with concurrent chemotherapy and IMRT.
DNA copy number aberration (CNA) is very important in the pathogenesis of tumors and other diseases. For example, CNAs may result in suppression of anti-oncogenes and activation of oncogenes, which would cause certain types of cancers. High density single nucleotide polymorphism (SNP) array data is widely used for the CNA detection. However, it is nontrivial to detect the CNA automatically because the signals obtained from high density SNP arrays often have low signal-to-noise ratio (SNR), which might be caused by whole genome amplification, mixtures of normal and tumor cells, experimental noise or other technical limitations. With the reduction in SNR, many false CNA regions are often detected and the true CNA regions are missed. Thus, more sophisticated statistical models are needed to make the CNAs detection, using the low SNR signals, more robust and reliable.
This paper presents a conditional random pattern (CRP) model for CNA detection where much contextual cues are explored to suppress the noise and improve CNA detection accuracy. Both simulated and the real data are used to evaluate the proposed model, and the validation results show that the CRP model is more robust and reliable in the presence of noise for CNA detection using high density SNP array data, compared to a number of widely used software packages.
The proposed conditional random pattern (CRP) model could effectively detect the CNA regions in the presence of noise.
Abstract Calcium ions (Ca2+) play a fundamental role in a variety of physiological functions in many cell types by acting as a secondary messenger. Variation of intracellular Ca2+ concentration ([Ca2+]i) is often observed when the cell is stimulated. However, it is a challenging task to automatically quantify intracellular [Ca2+]i in a population of cells. In this study, we present a workflow including specific algorithms for the automated intracellular calcium signal analysis using high-content, time-lapse cellular images. The experimental validations indicate the effectiveness of the proposed workflow and algorithms. We applied the workflow to analyze the intracellular calcium signals induced by different concentrations of H2O2 in the cell lines transfected by presenilin-1 (PS-1) that is known to be closely related to the familial Alzheimer's disease (FAD). The analysis results imply an important role of mutant PS-1, but not normal human PS-1 and mutant human amyloid precursor protein (APP), in enhancing intracellular calcium signaling induced by H2O2.
High content image analysis; Oxidative stress; Calcium oscillation; Familial Alzheimer's disease; Mutant presenilin 1
Automated cell segmentation and tracking are critical for quantitative analysis of cell cycle behavior using time-lapse fluorescence microscopy. However, the complex, dynamic cell cycle behavior poses new challenges to the existing image segmentation and tracking methods. This paper presents a fully automated tracking method for quantitative cell cycle analysis. In the proposed tracking method, we introduce a neighboring graph to characterize the spatial distribution of neighboring nuclei, and a novel dissimilarity measure is designed based on the spatial distribution, nuclei morphological appearance, migration, and intensity information. Then, we employ the integer programming and division matching strategy, together with the novel dissimilarity measure, to track cell nuclei. We applied this new tracking method for the tracking of HeLa cancer cells over several cell cycles, and the validation results showed that the high accuracy for segmentation and tracking at 99.5% and 90.0%, respectively. The tracking method has been implemented in the cell–cycle analysis software package, DCELLIQ, which is freely available.
Anti-cancer drug screening; cell cycle analysis; segmentation and tracking; time-lapse fluorescence microscopy
Optical microscopy is becoming an important technique in drug discovery and life science research. The approaches used to analyze optical microscopy images are generally classified into two categories: automatic and manual approaches. However, the existing automatic systems are rather limited in dealing with large volume of time-lapse microscopy images because of the complexity of cell behaviors and morphological variance. On the other hand, manual approaches are very time-consuming. In this paper, we propose an effective automated, quantitative analysis system that can be used to segment, track, and quantize cell cycle behaviors of a large population of cells nuclei effectively and efficiently. We use adaptive thresholding and watershed algorithm for cell nuclei segmentation followed by a fragment merging method that combines two scoring models based on trend and no trend features. Using the context information of time-lapse data, the phases of cell nuclei are identified accurately via a Markov model. Experimental results show that the proposed system is effective for nuclei segmentation and phase identification.
Cell phase identification; continuous Markov model; nuclei segmentation; time-lapse fluorescence microscopy; tracking
Motivation: Loss of heterozygosity (LOH) is one of the most important mechanisms in the tumor evolution. LOH can be detected from the genotypes of the tumor samples with or without paired normal samples. In paired sample cases, LOH detection for informative single nucleotide polymorphisms (SNPs) is straightforward if there is no genotyping error. But genotyping errors are always unavoidable, and there are about 70% non-informative SNPs whose LOH status can only be inferred from the neighboring informative SNPs.
Results: This article presents a novel LOH inference and segmentation algorithm based on the conditional random pattern (CRP) model. The new model explicitly considers the distance between two neighboring SNPs, as well as the genotyping error rate and the heterozygous rate. This new method is tested on the simulated and real data of the Affymetrix Human Mapping 500K SNP arrays. The experimental results show that the CRP method outperforms the conventional methods based on the hidden Markov model (HMM).
Availability: Software is available upon request.
Supplementary information: Supplementary data are available at Bioinformatics online.
With recent advances in fluorescence microscopy imaging techniques and methods of gene knock down by RNA interference (RNAi), genome-scale high-content screening (HCS) has emerged as a powerful approach to systematically identify all parts of complex biological processes. However, a critical barrier preventing fulfillment of the success is the lack of efficient and robust methods for automating RNAi image analysis and quantitative evaluation of the gene knock down effects on huge volume of HCS data. Facing such opportunities and challenges, we have started investigation of automatic methods towards the development of a fully automatic RNAi-HCS system. Particularly important are reliable approaches to cellular phenotype classification and image-based gene function estimation.
We have developed a HCS analysis platform that consists of two main components: fluorescence image analysis and image scoring. For image analysis, we used a two-step enhanced watershed method to extract cellular boundaries from HCS images. Segmented cells were classified into several predefined phenotypes based on morphological and appearance features. Using statistical characteristics of the identified phenotypes as a quantitative description of the image, a score is generated that reflects gene function. Our scoring model integrates fuzzy gene class estimation and single regression models. The final functional score of an image was derived using the weighted combination of the inference from several support vector-based regression models. We validated our phenotype classification method and scoring system on our cellular phenotype and gene database with expert ground truth labeling.
We built a database of high-content, 3-channel, fluorescence microscopy images of Drosophila Kc167 cultured cells that were treated with RNAi to perturb gene function. The proposed informatics system for microscopy image analysis is tested on this database. Both of the two main components, automated phenotype classification and image scoring system, were evaluated. The robustness and efficiency of our system were validated in quantitatively predicting the biological relevance of genes.
High-content screening; Image score inference
The recent emergence of high-throughput automated image acquisition technologies has forever changed how cell biologists collect and analyze data. Historically, the interpretation of cellular phenotypes in different experimental conditions has been dependent upon the expert opinions of well-trained biologists. Such qualitative analysis is particularly effective in detecting subtle, but important, deviations in phenotypes. However, while the rapid and continuing development of automated microscope-based technologies now facilitates the acquisition of trillions of cells in thousands of diverse experimental conditions, such as in the context of RNA interference (RNAi) or small-molecule screens, the massive size of these datasets precludes human analysis. Thus, the development of automated methods which aim to identify novel and biological relevant phenotypes online is one of the major challenges in high-throughput image-based screening. Ideally, phenotype discovery methods should be designed to utilize prior/existing information and tackle three challenging tasks, i.e. restoring pre-defined biological meaningful phenotypes, differentiating novel phenotypes from known ones and clarifying novel phenotypes from each other. Arbitrarily extracted information causes biased analysis, while combining the complete existing datasets with each new image is intractable in high-throughput screens.
Here we present the design and implementation of a novel and robust online phenotype discovery method with broad applicability that can be used in diverse experimental contexts, especially high-throughput RNAi screens. This method features phenotype modelling and iterative cluster merging using improved gap statistics. A Gaussian Mixture Model (GMM) is employed to estimate the distribution of each existing phenotype, and then used as reference distribution in gap statistics. This method is broadly applicable to a number of different types of image-based datasets derived from a wide spectrum of experimental conditions and is suitable to adaptively process new images which are continuously added to existing datasets. Validations were carried out on different dataset, including published RNAi screening using Drosophila embryos [Additional files 1, 2], dataset for cell cycle phase identification using HeLa cells [Additional files 1, 3, 4] and synthetic dataset using polygons, our methods tackled three aforementioned tasks effectively with an accuracy range of 85%–90%. When our method is implemented in the context of a Drosophila genome-scale RNAi image-based screening of cultured cells aimed to identifying the contribution of individual genes towards the regulation of cell-shape, it efficiently discovers meaningful new phenotypes and provides novel biological insight. We also propose a two-step procedure to modify the novelty detection method based on one-class SVM, so that it can be used to online phenotype discovery. In different conditions, we compared the SVM based method with our method using various datasets and our methods consistently outperformed SVM based method in at least two of three tasks by 2% to 5%. These results demonstrate that our methods can be used to better identify novel phenotypes in image-based datasets from a wide range of conditions and organisms.
We demonstrate that our method can detect various novel phenotypes effectively in complex datasets. Experiment results also validate that our method performs consistently under different order of image input, variation of starting conditions including the number and composition of existing phenotypes, and dataset from different screens. In our findings, the proposed method is suitable for online phenotype discovery in diverse high-throughput image-based genetic and chemical screens.