Recently, High-content screening (HCS) has been combined with RNA interference (RNAi) to become an essential image-based high-throughput method for studying genes and biological networks through RNAi-induced cellular phenotype analyses. However, a genome-wide RNAi-HCS screen typically generates tens of thousands of images, most of which remain uncategorized due to the inadequacies of existing HCS image analysis tools. Until now, it still requires highly trained scientists to browse a prohibitively large RNAi-HCS image database and produce only a handful of qualitative results regarding cellular morphological phenotypes. For this reason we have developed intelligent interfaces to facilitate the application of the HCS technology in biomedical research. Our new interfaces empower biologists with computational power not only to effectively and efficiently explore large-scale RNAi-HCS image databases, but also to apply their knowledge and experience to interactive mining of cellular phenotypes using Content-Based Image Retrieval (CBIR) with Relevance Feedback (RF) techniques.
Identifying and validating novel phenotypes from images inputting online is a major challenge against high-content RNA interference (RNAi) screening. Newly discovered phenotypes should be visually distinct from existing ones and make biological sense. An online phenotype discovery method featuring adaptive phenotype modeling and iterative cluster merging using improved gap statistics is proposed. Clustering results based on compactness criteria and Gaussian mixture models (GMM) for existing phenotypes iteratively modify each other by multiple hypothesis test and model optimization based on minimum classification error (MCE). The method works well on discovering new phenotypes adaptively when applied to both of synthetic datasets and RNAi high content screen (HCS) images with ground truth labels.
online phenotype discovery; RNA interference; high content screen; gap statistics; minimum classification error
Gene perturbation experiments in combination with fluorescence time-lapse cell imaging are a powerful tool in reverse genetics. High content applications require tools for the automated processing of the large amounts of data. These tools include in general several image processing steps, the extraction of morphological descriptors, and the grouping of cells into phenotype classes according to their descriptors. This phenotyping can be applied in a supervised or an unsupervised manner. Unsupervised methods are suitable for the discovery of formerly unknown phenotypes, which are expected to occur in high-throughput RNAi time-lapse screens.
We developed an unsupervised phenotyping approach based on Hidden Markov Models (HMMs) with multivariate Gaussian emissions for the detection of knockdown-specific phenotypes in RNAi time-lapse movies. The automated detection of abnormal cell morphologies allows us to assign a phenotypic fingerprint to each gene knockdown. By applying our method to the Mitocheck database, we show that a phenotypic fingerprint is indicative of a gene’s function.
Our fully unsupervised HMM-based phenotyping is able to automatically identify cell morphologies that are specific for a certain knockdown. Beyond the identification of genes whose knockdown affects cell morphology, phenotypic fingerprints can be used to find modules of functionally related genes.
Automated, image based high-content screening is a fundamental tool for discovery in biological science. Modern robotic fluorescence microscopes are able to capture thousands of images from massively parallel experiments such as RNA interference (RNAi) or small-molecule screens. As such, efficient computational methods are required for automatic cellular phenotype identification capable of dealing with large image data sets. In this paper we investigated an efficient method for the extraction of quantitative features from images by combining second order statistics, or Haralick features, with curvelet transform. A random subspace based classifier ensemble with multiple layer perceptron (MLP) as the base classifier was then exploited for classification. Haralick features estimate image properties related to second-order statistics based on the grey level co-occurrence matrix (GLCM), which has been extensively used for various image processing applications. The curvelet transform has a more sparse representation of the image than wavelet, thus offering a description with higher time frequency resolution and high degree of directionality and anisotropy, which is particularly appropriate for many images rich with edges and curves. A combined feature description from Haralick feature and curvelet transform can further increase the accuracy of classification by taking their complementary information. We then investigate the applicability of the random subspace (RS) ensemble method for phenotype classification based on microscopy images. A base classifier is trained with a RS sampled subset of the original feature set and the ensemble assigns a class label by majority voting.
Experimental results on the phenotype recognition from three benchmarking image sets including HeLa, CHO and RNAi show the effectiveness of the proposed approach. The combined feature is better than any individual one in the classification accuracy. The ensemble model produces better classification performance compared to the component neural networks trained. For the three images sets HeLa, CHO and RNAi, the Random Subspace Ensembles offers the classification rates 91.20%, 98.86% and 91.03% respectively, which compares sharply with the published result 84%, 93% and 82% from a multi-purpose image classifier WND-CHARM which applied wavelet transforms and other feature extraction methods. We investigated the problem of estimation of ensemble parameters and found that satisfactory performance improvement could be brought by a relative medium dimensionality of feature subsets and small ensemble size.
The characteristics of curvelet transform of being multiscale and multidirectional suit the description of microscopy images very well. It is empirically demonstrated that the curvelet-based feature is clearly preferred to wavelet-based feature for bioimage descriptions. The random subspace ensemble of MLPs is much better than a number of commonly applied multi-class classifiers in the investigated application of phenotype recognition.
The concept of drug discovery through stem cell biology is based on technological developments whose genesis is now coincident. The first is automated cell microscopy with concurrent advances in image acquisition and analysis, known as high content screening (HCS). The second is patient-derived stem cells for modeling the cell biology of brain diseases. HCS has developed from the requirements of the pharmaceutical industry for high throughput assays to screen thousands of chemical compounds in the search for new drugs. HCS combines new fluorescent probes with automated microscopy and computational power to quantify the effects of compounds on cell functions. Stem cell biology has advanced greatly since the discovery of genetic reprograming of somatic cells into induced pluripotent stem cells (iPSCs). There is now a rush of papers describing their generation from patients with various diseases of the nervous system. Although the majority of these have been genetic diseases, iPSCs have been generated from patients with complex diseases (schizophrenia and sporadic Parkinson’s disease). Some genetic diseases are also modeled in embryonic stem cells (ESCs) generated from blastocysts rejected during in vitro fertilization. Neural stem cells have been isolated from post-mortem brain of Alzheimer’s patients and neural stem cells generated from biopsies of the olfactory organ of patients is another approach. These “olfactory neurosphere-derived” cells demonstrate robust disease-specific phenotypes in patients with schizophrenia and Parkinson’s disease. HCS is already in use to find small molecules for the generation and differentiation of ESCs and iPSCs. The challenges for using stem cells for drug discovery are to develop robust stem cell culture methods that meet the rigorous requirements for repeatable, consistent quantities of defined cell types at the industrial scale necessary for HCS.
embryonic stem cells; induced pluripotent stem cells; olfactory stem cells; olfactory neurosphere-derived cells; high content screening
Neuroactive small molecules are indispensable tools for treating mental illnesses and dissecting nervous system function. However, it has been difficult to discover novel neuroactive drugs. Here, we describe a high—throughput (HT) behavior—based approach to neuroactive small molecule discovery in the zebrafish. We use automated screening assays to evaluate thousands of chemical compounds and find that diverse classes of neuroactive molecules cause distinct patterns of behavior. These `behavioral barcodes' can be used to rapidly identify novel psychotropic chemicals and to predict their molecular targets. For example, we identify novel acetylcholinesterase and monoamine oxidase inhibitors using phenotypic comparisons and computational techniques. By combining HT screening technologies with behavioral phenotyping in vivo, behavior—based chemical screens may accelerate the pace of neuroactive drug discovery and provide small—molecule tools for understanding vertebrate behavior.
The diversity of metazoan cell shapes is influenced by the dynamic cytoskeletal network. With the advent of RNA-interference (RNAi) technology, it is now possible to screen systematically for genes controlling specific cell-biological processes, including those required to generate distinct morphologies.
We adapted existing RNAi technology in Drosophila cell culture for use in high-throughput screens to enable a comprehensive genetic dissection of cell morphogenesis. To identify genes responsible for the characteristic shape of two morphologically distinct cell lines, we performed RNAi screens in each line with a set of double-stranded RNAs (dsRNAs) targeting 994 predicted cell shape regulators. Using automated fluorescence microscopy to visualize actin filaments, microtubules and DNA, we detected morphological phenotypes for 160 genes, one-third of which have not been previously characterized in vivo. Genes with similar phenotypes corresponded to known components of pathways controlling cytoskeletal organization and cell shape, leading us to propose similar functions for previously uncharacterized genes. Furthermore, we were able to uncover genes acting within a specific pathway using a co-RNAi screen to identify dsRNA suppressors of a cell shape change induced by Pten dsRNA.
Using RNAi, we identified genes that influence cytoskeletal organization and morphology in two distinct cell types. Some genes exhibited similar RNAi phenotypes in both cell types, while others appeared to have cell-type-specific functions, in part reflecting the different mechanisms used to generate a round or a flat cell morphology.
Automated microscopes have enabled the unprecedented collection of images at a rate that precludes visual inspection. Automated image analysis is required to identify interesting samples and extract quantitative information for high content screening (HCS). However, researchers are impeded by the lack of metrics and software tools to identify image-based aberrations that pollute data, limiting an experiment's quality. We have developed and validated approaches to identify those image acquisition artifacts that prevent optimal extraction of knowledge from high-throughput microscopy experiments. We have implemented these as a versatile, open-source toolbox of algorithms and metrics readily usable by biologists to improve data quality in a wide variety of biological experiments.
Cancer constitutes a heterogenic cellular system with a high level of spatio-temporal complexity. Recent discoveries by systems biologists have provided emerging evidence that cellular responses to anti-cancer modalities are stochastic in nature. To uncover the intricacies of cell-to-cell variability and its relevance to cancer therapy, new analytical screening technologies are needed. The last decade has brought forth spectacular innovations in the field of cytometry and single cell cytomics, opening new avenues for systems oncology and high-throughput real-time drug screening routines. The up-and-coming microfluidic Lab-on-a-Chip (LOC) technology and micro-total analysis systems (μTAS) are arguably the most promising platforms to address the inherent complexity of cellular systems with massive experimental parallelization and 4D analysis on a single cell level. The vast miniaturization of LOC systems and multiplexing enables innovative strategies to reduce drug screening expenditures while increasing throughput and content of information from a given sample. Small cell numbers and operational reagent volumes are sufficient for microfluidic analyzers and, as such, they enable next generation high-throughput and high-content screening of anti-cancer drugs on patient-derived specimens. Herein we highlight the selected advancements in this emerging field of bioengineering, and provide a snapshot of developments with relevance to anti-cancer drug screening routines.
Microfluidics; Lab-on-a-chip; Cytometry; Cytomics; Cancer; Anti-cancer drugs; Cancer therapy; Drug screening
Advances in microscopy automation and image analysis have given biologists the tools to attempt large scale systems-level experiments on biological systems using microscope image readout. Fluorescence microscopy has become a standard tool for assaying gene function in RNAi knockdown screens and protein localization studies in eukaryotic systems. Similar high throughput studies can be attempted in prokaryotes, though the difficulties surrounding work at the diffraction limit pose challenges, and targeting essential genes in a high throughput way can be difficult. Here we will discuss efforts to make live-cell fluorescent microscopy based experiments using genetically encoded fluorescent reporters an automated, high throughput, and quantitative endeavor amenable to systems-level experiments in bacteria. We emphasize a quantitative data reduction approach, using simulation to help develop biologically relevant cell measurements that completely characterize the cell image. We give an example of how this type of data can be directly exploited by statistical learning algorithms to discover functional pathways.
The diffraction limit makes high-throughput fluorescence microscopy more challenging in prokaryotes, but approaches such as quantitative data reduction now allow systems-level analysis of bacteria by this technique.
FLIGHT (http://flight.icr.ac.uk/) is an online resource compiling data from high-throughput Drosophila in vivo and in vitro RNAi screens. FLIGHT includes details of RNAi reagents and their predicted off-target effects, alongside RNAi screen hits, scores and phenotypes, including images from high-content screens. The latest release of FLIGHT is designed to enable users to upload, analyze, integrate and share their own RNAi screens. Users can perform multiple normalizations, view quality control plots, detect and assign screen hits and compare hits from multiple screens using a variety of methods including hierarchical clustering. FLIGHT integrates RNAi screen data with microarray gene expression as well as genomic annotations and genetic/physical interaction datasets to provide a single interface for RNAi screen analysis and datamining in Drosophila.
RNAi; database; integration; bioinformatics; phenotype
High throughput screening technologies enable biologists to generate candidate genes at a rate that, due to time and cost constraints, cannot be studied by experimental approaches in the laboratory. Thus, it has become increasingly important to prioritize candidate genes for experiments. To accomplish this, researchers need to apply selection requirements based on their knowledge, which necessitates qualitative integration of heterogeneous data sources and filtration using multiple criteria. A similar approach can also be applied to putative candidate gene relationships. While automation can assist in this routine and imperative procedure, flexibility of data sources and criteria must not be sacrificed. A tool that can optimize the trade-off between automation and flexibility to simultaneously filter and qualitatively integrate data is needed to prioritize candidate genes and generate composite networks from heterogeneous data sources.
We developed the java application, EnRICH (Extraction and Ranking using Integration and Criteria Heuristics), in order to alleviate this need. Here we present a case study in which we used EnRICH to integrate and filter multiple candidate gene lists in order to identify potential retinal disease genes. As a result of this procedure, a candidate pool of several hundred genes was narrowed down to five candidate genes, of which four are confirmed retinal disease genes and one is associated with a retinal disease state.
We developed a platform-independent tool that is able to qualitatively integrate multiple heterogeneous datasets and use different selection criteria to filter each of them, provided the datasets are tables that have distinct identifiers (required) and attributes (optional). With the flexibility to specify data sources and filtering criteria, EnRICH automatically prioritizes candidate genes or gene relationships for biologists based on their specific requirements. Here, we also demonstrate that this tool can be effectively and easily used to apply highly specific user-defined criteria and can efficiently identify high quality candidate genes from relatively sparse datasets.
Qualitative integration; High-throughput data; Heterogeneous data; Network; Network visualization; Candidate prioritization
Functional genomic screens apply knowledge gained from the sequencing of the human genome toward rapid methods of identifying genes involved in cellular function based on a specific phenotype. This approach has been made possible through the use of advances in both molecular biology and automation. The utility of this approach has been further enhanced through the application of image-based high content screening, an automated microscopy and quantitative image analysis platform. These approaches can significantly enhance acquisition of novel targets for drug discovery.
Both the utility and potential issues associated with functional genomic screening approaches are discussed along with examples that illustrate both. The considerations for high content screening applied to functional genomics are also presented.
Functional genomic and high content screening are extremely useful in the identification of new drug targets. However, the technical, experimental, and computational parameters have an enormous influence on the results. Thus, although new targets are identified, caution should be applied toward interpretation of screening data in isolation. Genomic screens should be viewed as an integral component of a target identification campaign that requires both the acquisition of orthogonal data, as well as a rigorous validation strategy.
Genome-wide siRNA screening; High content screening; Target identification; Target deconvolution
Recent advances in automation technologies have enabled the use of flow cytometry for high throughput screening, generating large complex data sets often in clinical trials or drug discovery settings. However, data management and data analysis methods have not advanced sufficiently far from the initial small-scale studies to support modeling in the presence of multiple covariates.
We developed a set of flexible open source computational tools in the R package flowCore to facilitate the analysis of these complex data. A key component of which is having suitable data structures that support the application of similar operations to a collection of samples or a clinical cohort. In addition, our software constitutes a shared and extensible research platform that enables collaboration between bioinformaticians, computer scientists, statisticians, biologists and clinicians. This platform will foster the development of novel analytic methods for flow cytometry.
The software has been applied in the analysis of various data sets and its data structures have proven to be highly efficient in capturing and organizing the analytic work flow. Finally, a number of additional Bioconductor packages successfully build on the infrastructure provided by flowCore, open new avenues for flow data analysis.
The data generated during a course of a biological experiment/study can be sometimes be massive and its management becomes quite critical for the success of the investigation undertaken. The accumulation and analysis of such large datasets often becomes tedious for biologists and lab technicians. Most of the current phenotype data acquisition management systems do not cater to the specialized needs of large-scale data analysis. The successful application of genomic tools/strategies to introduce desired traits in plants requires extensive and precise phenotyping of plant populations or gene bank material, thus necessitating an efficient data acquisition system.
Here we describe newly developed software "PHENOME" for high-throughput phenotyping, which allows researchers to accumulate, categorize, and manage large volume of phenotypic data. In this study, a large number of individual tomato plants were phenotyped with the "PHENOME" application using a Personal Digital Assistant (PDA) with built-in barcode scanner in concert with customized database specific for handling large populations.
The phenotyping of large population of plants both in the laboratory and in the field is very efficiently managed using PDA. The data is transferred to a specialized database(s) where it can be further analyzed and catalogued. The "PHENOME" aids collection and analysis of data obtained in large-scale mutagenesis, assessing quantitative trait loci (QTLs), raising mapping population, sampling of several individuals in one or more ecological niches etc.
More accurate and precise phenotyping strategies are necessary to empower high-resolution linkage mapping and genome-wide association studies and for training genomic selection models in plant improvement. Within this framework, the objective of modern phenotyping is to increase the accuracy, precision and throughput of phenotypic estimation at all levels of biological organization while reducing costs and minimizing labor through automation, remote sensing, improved data integration and experimental design. Much like the efforts to optimize genotyping during the 1980s and 1990s, designing effective phenotyping initiatives today requires multi-faceted collaborations between biologists, computer scientists, statisticians and engineers. Robust phenotyping systems are needed to characterize the full suite of genetic factors that contribute to quantitative phenotypic variation across cells, organs and tissues, developmental stages, years, environments, species and research programs. Next-generation phenotyping generates significantly more data than previously and requires novel data management, access and storage systems, increased use of ontologies to facilitate data integration, and new statistical tools for enhancing experimental design and extracting biologically meaningful signal from environmental and experimental noise. To ensure relevance, the implementation of efficient and informative phenotyping experiments also requires familiarity with diverse germplasm resources, population structures, and target populations of environments. Today, phenotyping is quickly emerging as the major operational bottleneck limiting the power of genetic analysis and genomic prediction. The challenge for the next generation of quantitative geneticists and plant breeders is not only to understand the genetic basis of complex trait variation, but also to use that knowledge to efficiently synthesize twenty-first century crop varieties.
In the last few years high-throughput analysis methods have become state-of-the-art in the life sciences. One of the latest developments is automated greenhouse systems for high-throughput plant phenotyping. Such systems allow the non-destructive screening of plants over a period of time by means of image acquisition techniques. During such screening different images of each plant are recorded and must be analysed by applying sophisticated image analysis algorithms.
This paper presents an image analysis pipeline (HTPheno) for high-throughput plant phenotyping. HTPheno is implemented as a plugin for ImageJ, an open source image processing software. It provides the possibility to analyse colour images of plants which are taken in two different views (top view and side view) during a screening. Within the analysis different phenotypical parameters for each plant such as height, width and projected shoot area of the plants are calculated for the duration of the screening. HTPheno is applied to analyse two barley cultivars.
HTPheno, an open source image analysis pipeline, supplies a flexible and adaptable ImageJ plugin which can be used for automated image analysis in high-throughput plant phenotyping and therefore to derive new biological insights, such as determination of fitness.
The ability of embryonic stem (ES) cells to generate any of the around 220 cell types of the adult body has fascinated scientists ever since their discovery. The capacity to re-program fully differentiated cells into induced pluripotent stem (iPS) cells has further stimulated the interest in ES cell research. Fueled by this interest, intense research has provided new insights into the biology of ES cells in the recent past. The development of large-scale and high throughput RNAi technologies has made it possible to sample the role of every gene in maintaining ES cell identity. Here, we review the RNAi screens performed in ES cells to date and discuss the challenges associated with these large-scale experiments. Furthermore, we provide a perspective on how to streamline the molecular characterization following the initial phenotypic description utilizing bacterial artificial chromosome (BAC) transgenesis.
RNA interference; siRNA; shRNA; esiRNA; Genome-wide screen; Bacterial artificial chromosome; TransgeneOmics
Motivation: High-throughput perturbation screens measure the phenotypes of thousands of biological samples under various conditions. The phenotypes measured in the screens are subject to substantial biological and technical variation. At the same time, in order to enable high throughput, it is often impossible to include a large number of replicates, and to randomize their order throughout the screens. Distinguishing true changes in the phenotype from stochastic variation in such experimental designs is extremely challenging, and requires adequate statistical methodology.
Results: We propose a statistical modeling framework that is based on experimental designs with at least two controls profiled throughout the experiment, and a normalization and variance estimation procedure with linear mixed-effects models. We evaluate the framework using three comprehensive screens of Saccharomyces cerevisiae, which involve 4940 single-gene knock-out haploid mutants, 1127 single-gene knock-out diploid mutants and 5798 single-gene overexpression haploid strains. We show that the proposed approach (i) can be used in conjunction with practical experimental designs; (ii) allows extensions to alternative experimental workflows; (iii) enables a sensitive discovery of biologically meaningful changes; and (iv) strongly outperforms the existing noise reduction procedures.
Availability: All experimental datasets are publicly available at www.ionomicshub.org. The R package HTSmix is available at http://www.stat.purdue.edu/~ovitek/HTSmix.html.
Supplementary information: Supplementary data are available at Bioinformatics online.
High-throughput genome-wide RNA interference (RNAi) screening is emerging as an essential tool to assist biologists in understanding complex cellular processes. The large number of images produced in each study make manual analysis intractable; hence, automatic cellular image analysis becomes an urgent need, where segmentation is the first and one of the most important steps. In this paper, a fully automatic method for segmentation of cells from genome-wide RNAi screening images is proposed. Nuclei are first extracted from the DNA channel by using a modified watershed algorithm. Cells are then extracted by modeling the interaction between them as well as combining both gradient and region information in the Actin and Rac channels. A new energy functional is formulated based on a novel interaction model for segmenting tightly clustered cells with significant intensity variance and specific phenotypes. The energy functional is minimized by using a multiphase level set method, which leads to a highly effective cell segmentation method. Promising experimental results demonstrate that automatic segmentation of high-throughput genome-wide multichannel screening can be achieved by using the proposed method, which may also be extended to other multichannel image segmentation problems.
Fluorescent microscopy; high throughput; image segmentation; interaction model; level set; multichannel
RNA interference (RNAi) has become a powerful technique for reverse genetics and drug discovery and, in both of these areas, large-scale high-throughput RNAi screens are commonly performed. The statistical techniques used to analyze these screens are frequently borrowed directly from small-molecule screening; however small-molecule and RNAi data characteristics differ in meaningful ways. We examine the similarities and differences between RNAi and small-molecule screens, highlighting particular characteristics of RNAi screen data that must be addressed during analysis. Additionally, we provide guidance on selection of analysis techniques in the context of a sample workflow.
This case study examines the automation and process change options available to emerging discovery/development stage pharmaceutical companies when considering implementing sophisticated high-throughput screens. Generally there are both financial and personnel constraints that have to be addressed when implementing state-of-the-art screening technology in smaller companies which generally are not as significant as in large pharmaceutical companies. When NPS Pharmaceuticals considered installing a Molecular Devices FLIPR™ for high-throughput cell based screening it became clear that, to make the best decision, the whole screening process at NPS Pharmaceuticals from screen development and validation, tissue culture, compound distribution, data handling and screening had to be re-examined to see what automation options were possible and which, if any, made sense to implement. Large scale automated systems were not considered due to their cost and the lack of in-house engineering infrastructure to support such systems. The current trend towards workstation based laboratory automation suggested that a minimalist approach to laboratory automation, coupled with improved understanding of the physical process of screening, would yield the best approach. Better understanding of the work flow within the Biomolecular Screening team enabled the group to optimize the process and decide what support equipment was needed. To install the FLIPR™, train users, set up the tissue culture protocols for cell supply, establish high-throughput screening database protocols, integrate compound distribution and re-supply and validate the pharmacology on four cell based screens took the team 3 months. The integration of the screening team at the primary, secondary and tertiary screening stages of the target discovery project teams at NPS has enabled us to incorporate minimal automation into the Biomolecular Screening Group whilst retaining an enriching work environment. This is reflected in our current consistent throughput of 64 96-well microplates per day on the FLIPR™, a figure that is comparable with that achieved within most major pharmaceutical companies. This case study suggests that process optimization coupled with modern stand alone automated workstations can achieve significant throughput in a resource constrained environment. Significantly greater throughput could be achieved by coupling the process improvement techniques described above with 384-well microplate technology.
Phenotypes are an important subject of biomedical research for which many repositories have already been created. Most of these databases are either dedicated to a single species or to a single disease of interest. With the advent of technologies to generate phenotypes in a high-throughput manner, not only is the volume of phenotype data growing fast but also the need to organize these data in more useful ways. We have created PhenomicDB (freely available at ), a multi-species genotype/phenotype database, which shows phenotypes associated with their corresponding genes and grouped by gene orthologies across a variety of species. We have enhanced PhenomicDB recently by additionally incorporating quantitative and descriptive RNA interference (RNAi) screening data, by enabling the usage of phenotype ontology terms and by providing information on assays and cell lines. We envision that integration of classical phenotypes with high-throughput data will bring new momentum and insights to our understanding. Modern analysis tools under development may help exploiting this wealth of information to transform it into knowledge and, eventually, into novel therapeutic approaches.
Comparative biological studies have led to remarkable biomedical discoveries. While genomic science and technologies are advancing rapidly, our ability to precisely specify a phenotype and compare it to related phenotypes of other organisms remains challenging. This study has examined the systematic use of terminology and knowledge based technologies to enable high throughput comparative phenomics. More specifically, we measured the accuracy of a multistrategy automated classification method to bridge the phenotype gap between a phenotypic terminology (MGD: Phenoslim) and a broad-coverage clinical terminology (SNOMED CT). Furthermore, we qualitatively evaluate the additional emerging properties of the combined terminological network for comparative biology and discovery science. According to the gold standard (n=100), the accuracies (precision | recall) of the composite automated methods were 67% | 97% (mapping for identical concepts) and 85% | 98% (classification). Quantitatively, only 2% of the phenotypic concepts were missing from the clinical terminology, however, qualitatively the gap was larger: conceptual scope, granularity and subtle, yet significant, homonymy problems were observed. These results suggest that, as observed in other domains, additional strategies are required for combining terminologies.
High-content, high-throughput RNA interference (RNAi) offers unprecedented possibilities to elucidate gene function and involvement in biological processes. Microscopy based screening allows phenotypic observations at the level of individual cells. It was recently shown that a cell's population context significantly influences results. However, standard analysis methods for cellular screens do not currently take individual cell data into account unless this is important for the phenotype of interest, i.e. when studying cell morphology.
We present a method that normalizes and statistically scores microscopy based RNAi screens, exploiting individual cell information of hundreds of cells per knockdown. Each cell's individual population context is employed in normalization. We present results on two infection screens for hepatitis C and dengue virus, both showing considerable effects on observed phenotypes due to population context. In addition, we show on a non-virus screen that these effects can be found also in RNAi data in the absence of any virus. Using our approach to normalize against these effects we achieve improved performance in comparison to an analysis without this normalization and hit scoring strategy. Furthermore, our approach results in the identification of considerably more significantly enriched pathways in hepatitis C virus replication than using a standard analysis approach.
Using a cell-based analysis and normalization for population context, we achieve improved sensitivity and specificity not only on a individual protein level, but especially also on a pathway level. This leads to the identification of new host dependency factors of the hepatitis C and dengue viruses and higher reproducibility of results.