RNA interference (RNAi) is being used in large-scale genomic studies as a rapid way to obtain in vivo functional information associated with specific genes. How best to archive and mine the complex data derived from these studies provides a series of challenges associated with both the methods used to elicit the RNAi response and the functional data gathered. RNAiDB (RNAi Database; http://www.rnai.org) has been created for the archival, distribution and analysis of phenotypic data from large-scale RNAi analyses in Caenorhabditis elegans. The database contains a compendium of publicly available data and provides information on experimental methods and phenotypic results, including raw data in the form of images and streaming time-lapse movies. Phenotypic summaries together with graphical displays of RNAi to gene mappings allow quick intuitive comparison of results from different RNAi assays and visualization of the gene product(s) potentially inhibited by each RNAi experiment based on multiple sequence analysis methods. RNAiDB can be searched using combinatorial queries and using the novel tool PhenoBlast, which ranks genes according to their overall phenotypic similarity. RNAiDB could serve as a model database for distributing and navigating in vivo functional information from large-scale systematic phenotypic analyses in different organisms.
RNA interference (RNAi) leads to sequence-specific knockdown of gene function. The approach can be used in large-scale screens to interrogate function in various model organisms and an increasing number of other species. Genome-scale RNAi screens are routinely performed in cultured or primary cells or in vivo in organisms such as C. elegans. High-throughput RNAi screening is benefitting from the development of sophisticated new instrumentation and software tools for collecting and analyzing data, including high-content image data. The results of large-scale RNAi screens have already proved useful, leading to new understandings of gene function relevant to topics such as infection, cancer, obesity and aging. Nevertheless, important caveats apply and should be taken into consideration when developing or interpreting RNAi screens. Some level of false discovery is inherent to high-throughput approaches and specific to RNAi screens, false discovery due to off-target effects (OTEs) of RNAi reagents remains a problem. The need to improve our ability to use RNAi to elucidate gene function at large scale and in additional systems continues to be addressed through improved RNAi library design, development of innovative computational and analysis tools and other approaches.
RNAi; high-throughput screens; high-content imaging; cell-based assays
RNA interference (RNAi) represents a powerful method to systematically study loss-of-function phenotypes on a large scale with a wide variety of biological assays, constituting a rich source for the assignment of gene function. The GenomeRNAi database (http://www.genomernai.org) makes available RNAi phenotype data extracted from the literature for human and Drosophila. It also provides RNAi reagent information, along with an assessment as to their efficiency and specificity. This manuscript describes an update of the database previously featured in the NAR Database Issue. The new version has undergone a complete re-design of the user interface, providing an intuitive, flexible framework for additional functionalities. Screen information and gene-reagent-phenotype associations are now available for download. The integration with other resources has been improved by allowing in-links via GenomeRNAi screen IDs, or external gene or reagent identifiers. A distributed annotation system (DAS) server enables the visualization of the phenotypes and reagents in the context of a genome browser. We have added a page listing ‘frequent hitters’, i.e. genes that show a phenotype in many screens, which might guide on-going RNAi studies. Structured annotation guidelines have been established to facilitate consistent curation, and a submission template for direct submission by data producers is available for download.
The recent emergence of high-throughput automated image acquisition technologies has forever changed how cell biologists collect and analyze data. Historically, the interpretation of cellular phenotypes in different experimental conditions has been dependent upon the expert opinions of well-trained biologists. Such qualitative analysis is particularly effective in detecting subtle, but important, deviations in phenotypes. However, while the rapid and continuing development of automated microscope-based technologies now facilitates the acquisition of trillions of cells in thousands of diverse experimental conditions, such as in the context of RNA interference (RNAi) or small-molecule screens, the massive size of these datasets precludes human analysis. Thus, the development of automated methods which aim to identify novel and biological relevant phenotypes online is one of the major challenges in high-throughput image-based screening. Ideally, phenotype discovery methods should be designed to utilize prior/existing information and tackle three challenging tasks, i.e. restoring pre-defined biological meaningful phenotypes, differentiating novel phenotypes from known ones and clarifying novel phenotypes from each other. Arbitrarily extracted information causes biased analysis, while combining the complete existing datasets with each new image is intractable in high-throughput screens.
Here we present the design and implementation of a novel and robust online phenotype discovery method with broad applicability that can be used in diverse experimental contexts, especially high-throughput RNAi screens. This method features phenotype modelling and iterative cluster merging using improved gap statistics. A Gaussian Mixture Model (GMM) is employed to estimate the distribution of each existing phenotype, and then used as reference distribution in gap statistics. This method is broadly applicable to a number of different types of image-based datasets derived from a wide spectrum of experimental conditions and is suitable to adaptively process new images which are continuously added to existing datasets. Validations were carried out on different dataset, including published RNAi screening using Drosophila embryos [Additional files 1, 2], dataset for cell cycle phase identification using HeLa cells [Additional files 1, 3, 4] and synthetic dataset using polygons, our methods tackled three aforementioned tasks effectively with an accuracy range of 85%–90%. When our method is implemented in the context of a Drosophila genome-scale RNAi image-based screening of cultured cells aimed to identifying the contribution of individual genes towards the regulation of cell-shape, it efficiently discovers meaningful new phenotypes and provides novel biological insight. We also propose a two-step procedure to modify the novelty detection method based on one-class SVM, so that it can be used to online phenotype discovery. In different conditions, we compared the SVM based method with our method using various datasets and our methods consistently outperformed SVM based method in at least two of three tasks by 2% to 5%. These results demonstrate that our methods can be used to better identify novel phenotypes in image-based datasets from a wide range of conditions and organisms.
We demonstrate that our method can detect various novel phenotypes effectively in complex datasets. Experiment results also validate that our method performs consistently under different order of image input, variation of starting conditions including the number and composition of existing phenotypes, and dataset from different screens. In our findings, the proposed method is suitable for online phenotype discovery in diverse high-throughput image-based genetic and chemical screens.
High-throughput genome-wide RNA interference (RNAi) screening is emerging as an essential tool to assist biologists in understanding complex cellular processes. The large number of images produced in each study make manual analysis intractable; hence, automatic cellular image analysis becomes an urgent need, where segmentation is the first and one of the most important steps. In this paper, a fully automatic method for segmentation of cells from genome-wide RNAi screening images is proposed. Nuclei are first extracted from the DNA channel by using a modified watershed algorithm. Cells are then extracted by modeling the interaction between them as well as combining both gradient and region information in the Actin and Rac channels. A new energy functional is formulated based on a novel interaction model for segmenting tightly clustered cells with significant intensity variance and specific phenotypes. The energy functional is minimized by using a multiphase level set method, which leads to a highly effective cell segmentation method. Promising experimental results demonstrate that automatic segmentation of high-throughput genome-wide multichannel screening can be achieved by using the proposed method, which may also be extended to other multichannel image segmentation problems.
Fluorescent microscopy; high throughput; image segmentation; interaction model; level set; multichannel
A large set of high-content RNAi screens investigating mammalian virus infection and multiple cellular activities is analysed to reveal the impact of population context on phenotypic variability and to identify indirect RNAi effects.
Cell population context determines phenotypes in RNAi screens of multiple cellular activities (including virus infection, cell size regulation, endocytosis, and lipid homeostasis), which can be accounted for by a combination of novel image analysis and multivariate statistical methods.Accounting for cell population context-mediated effects strongly changes the reproducibility and consistency of RNAi screens across cell lines as well as of siRNAs targeting the same gene.Such analyses can identify the perturbed regulation of population context dependent cell-to-cell variability, a novel perturbation phenotype.Overall, these methods advance the use of large-scale RNAi screening for a systems-level understanding of cellular processes.
Isogenic cells in culture show strong variability, which arises from dynamic adaptations to the microenvironment of individual cells. Here we study the influence of the cell population context, which determines a single cell's microenvironment, in image-based RNAi screens. We developed a comprehensive computational approach that employs Bayesian and multivariate methods at the single-cell level. We applied these methods to 45 RNA interference screens of various sizes, including 7 druggable genome and 2 genome-wide screens, analysing 17 different mammalian virus infections and four related cell physiological processes. Analysing cell-based screens at this depth reveals widespread RNAi-induced changes in the population context of individual cells leading to indirect RNAi effects, as well as perturbations of cell-to-cell variability regulators. We find that accounting for indirect effects improves the consistency between siRNAs targeted against the same gene, and between replicate RNAi screens performed in different cell lines, in different labs, and with different siRNA libraries. In an era where large-scale RNAi screens are increasingly performed to reach a systems-level understanding of cellular processes, we show that this is often improved by analyses that account for and incorporate the single-cell microenvironment.
cell-to-cell variability; image analysis; population context; RNAi; virus infection
FLIGHT (http://flight.icr.ac.uk/) is an online resource compiling data from high-throughput Drosophila in vivo and in vitro RNAi screens. FLIGHT includes details of RNAi reagents and their predicted off-target effects, alongside RNAi screen hits, scores and phenotypes, including images from high-content screens. The latest release of FLIGHT is designed to enable users to upload, analyze, integrate and share their own RNAi screens. Users can perform multiple normalizations, view quality control plots, detect and assign screen hits and compare hits from multiple screens using a variety of methods including hierarchical clustering. FLIGHT integrates RNAi screen data with microarray gene expression as well as genomic annotations and genetic/physical interaction datasets to provide a single interface for RNAi screen analysis and datamining in Drosophila.
RNAi; database; integration; bioinformatics; phenotype
In many eukaryotic cells, double-stranded RNA (dsRNA) triggers RNA interference (RNAi), the specific degradation of RNA of homologous sequence. RNAi is now a major tool for reverse-genetics projects, including large-scale high-throughput screens. Recent reports have questioned the specificity of RNAi, raising problems in interpretation of RNAi-based experiments.
Using the protozoan Trypanosoma brucei as a model, we designed a functional complementation assay to ascertain that phenotypic effect(s) observed upon RNAi were due to specific silencing of the targeted gene. This was applied to a cytoskeletal gene encoding the paraflagellar rod protein 2 (TbPFR2), whose product is essential for flagellar motility. We demonstrate the complementation of TbPFR2, silenced via dsRNA targeting its UTRs, through the expression of a tagged RNAi-resistant TbPFR2 encoding a protein that could be immunolocalized in the flagellum. Next, we performed a functional complementation of TbPFR2, silenced via dsRNA targeting its coding sequence, through heterologous expression of the TbPFR2 orthologue gene from Trypanosoma cruzi: the flagellum regained its motility.
This work shows that functional complementation experiments can be readily performed in order to ascertain that phenotypic effects observed upon RNAi experiments are indeed due to the specific silencing of the targetted gene. Further, the results described here are of particular interest when reverse genetics studies cannot be easily achieved in organisms not amenable to RNAi. In addition, our strategy should constitute a firm basis to elaborate functional-dissection studies of genes from other organisms.
RNA interference (RNAi) has become a powerful technique for reverse genetics and drug discovery and, in both of these areas, large-scale high-throughput RNAi screens are commonly performed. The statistical techniques used to analyze these screens are frequently borrowed directly from small-molecule screening; however small-molecule and RNAi data characteristics differ in meaningful ways. We examine the similarities and differences between RNAi and small-molecule screens, highlighting particular characteristics of RNAi screen data that must be addressed during analysis. Additionally, we provide guidance on selection of analysis techniques in the context of a sample workflow.
Image-based, high throughput genome-wide RNA interference (RNAi) experiments are increasingly carried out to facilitate the understanding of gene functions in intricate biological processes. Automated screening of such experiments generates a large number of images with great variations in image quality, which makes manual analysis unreasonably time-consuming. Therefore, effective techniques for automatic image analysis are urgently needed, in which segmentation is one of the most important steps. This paper proposes a fully automatic method for cells segmentation in genome-wide RNAi screening images. The method consists of two steps: nuclei and cytoplasm segmentation. Nuclei are extracted and labelled to initialize cytoplasm segmentation. Since the quality of RNAi image is rather poor, a novel scale-adaptive steerable filter is designed to enhance the image in order to extract long and thin protrusions on the spiky cells. Then, constraint factor GCBAC method and morphological algorithms are combined to be an integrated method to segment tight clustered cells. Compared with the results obtained by using seeded watershed and the ground truth, that is, manual labelling results by experts in RNAi screening data, our method achieves higher accuracy. Compared with active contour methods, our method consumes much less time. The positive results indicate that the proposed method can be applied in automatic image analysis of multi-channel image screening data.
active contour; automatic image segmentation; constraint factor; fluorescent microscopy; genome-wide screening; graph cut; morphological algorithm; RNAi
Cell-based high-throughput RNAi screening has become a powerful research tool in addressing a variety of biological questions. In RNAi screening, one of the most commonly applied assay system is measuring the fitness of cells that is usually quantified using fluorescence, luminescence and absorption-based readouts. These methods, typically implemented and scaled to large-scale screening format, however often only yield limited information on the cell fitness phenotype due to evaluation of a single and indirect physiological indicator. To address this problem, we have established a cell fitness multiplexing assay which combines a biochemical approach and two fluorescence-based assaying methods. We applied this assay in a large-scale RNAi screening experiment with siRNA pools targeting the human kinome in different modified HEK293 cell lines. Subsequent analysis of ranked fitness phenotypes assessed by the different assaying methods revealed average phenotype intersections of 50.7±2.3%–58.7±14.4% when two indicators were combined and 40–48% when a third indicator was taken into account. From these observations we conclude that combination of multiple fitness measures may decrease false-positive rates and increases confidence for hit selection. Our robust experimental and analytical method improves the classical approach in terms of time, data comprehensiveness and cost.
With recent advances in fluorescence microscopy imaging techniques and methods of gene knock down by RNA interference (RNAi), genome-scale high-content screening (HCS) has emerged as a powerful approach to systematically identify all parts of complex biological processes. However, a critical barrier preventing fulfillment of the success is the lack of efficient and robust methods for automating RNAi image analysis and quantitative evaluation of the gene knock down effects on huge volume of HCS data. Facing such opportunities and challenges, we have started investigation of automatic methods towards the development of a fully automatic RNAi-HCS system. Particularly important are reliable approaches to cellular phenotype classification and image-based gene function estimation.
We have developed a HCS analysis platform that consists of two main components: fluorescence image analysis and image scoring. For image analysis, we used a two-step enhanced watershed method to extract cellular boundaries from HCS images. Segmented cells were classified into several predefined phenotypes based on morphological and appearance features. Using statistical characteristics of the identified phenotypes as a quantitative description of the image, a score is generated that reflects gene function. Our scoring model integrates fuzzy gene class estimation and single regression models. The final functional score of an image was derived using the weighted combination of the inference from several support vector-based regression models. We validated our phenotype classification method and scoring system on our cellular phenotype and gene database with expert ground truth labeling.
We built a database of high-content, 3-channel, fluorescence microscopy images of Drosophila Kc167 cultured cells that were treated with RNAi to perturb gene function. The proposed informatics system for microscopy image analysis is tested on this database. Both of the two main components, automated phenotype classification and image scoring system, were evaluated. The robustness and efficiency of our system were validated in quantitatively predicting the biological relevance of genes.
High-content screening; Image score inference
The completion of the genome sequencing for several organisms has
created a great demand for genomic tools that can systematically
analyze the growing wealth of data. In contrast to the classical
reverse genetics approach of creating specific knockout cell lines
or animals that is time-consuming and expensive, RNA-mediated
interference (RNAi) has emerged as a fast, simple, and
cost-effective technique for gene knockdown in large scale. Since
its discovery as a gene silencing response to double-stranded RNA
(dsRNA) with homology to endogenous genes in
Caenorhabditis elegans (C elegans),
RNAi technology has been adapted to various high-throughput
screens (HTS) for genome-wide loss-of-function (LOF) analysis.
Biochemical insights into the endogenous mechanism of
RNAi have led to advances in RNAi methodology including RNAi
molecule synthesis, delivery, and sequence design. In this
article, we will briefly review these various RNAi library designs
and discuss the benefits and drawbacks of each library strategy.
FLIGHT () is a new database designed to help researchers browse and cross-correlate data from large-scale RNAi studies. To date, the majority of these functional genomic screens have been carried out using Drosophila cell lines. These RNAi screens follow 100 years of classical Drosophila genetics, but have already revealed their potential by ascribing an impressive number of functions to known and novel genes. This has in turn given rise to a pressing need for tools to simplify the analysis of the large amount of phenotypic information generated. FLIGHT aims to do this by providing users with a gene-centric view of screen results and by making it possible to cluster phenotypic data to identify genes with related functions. Additionally, FLIGHT provides microarray expression data for many of the Drosophila cell lines commonly used in RNAi screens. This, together with information about cell lines, protocols and dsRNA primer sequences, is intended to help researchers design their own cell-based screens. Finally, although the current focus of FLIGHT is Drosophila, the database has been designed to facilitate the comparison of functional data across species and to help researchers working with other systems navigate their way through the fly genome.
The GenomeRNAi database (http://www.genomernai.org/) contains phenotypes from published cell-based RNA interference (RNAi) screens in Drosophila and Homo sapiens. The database connects observed phenotypes with annotations of targeted genes and information about the RNAi reagent used for the perturbation experiment. The availability of phenotypes from Drosophila and human screens also allows for phenotype searches across species. Besides reporting quantitative data from genome-scale screens, the new release of GenomeRNAi also enables reporting of data from microscopy experiments and curated phenotypes from published screens. In addition, the database provides an updated resource of RNAi reagents and their predicted quality that are available for the Drosophila and the human genome. The new version also facilitates the integration with other genomic data sets and contains expression profiling (RNA-Seq) data for several cell lines commonly used in RNAi experiments.
Phenotypes are an important subject of biomedical research for which many repositories have already been created. Most of these databases are either dedicated to a single species or to a single disease of interest. With the advent of technologies to generate phenotypes in a high-throughput manner, not only is the volume of phenotype data growing fast but also the need to organize these data in more useful ways. We have created PhenomicDB (freely available at ), a multi-species genotype/phenotype database, which shows phenotypes associated with their corresponding genes and grouped by gene orthologies across a variety of species. We have enhanced PhenomicDB recently by additionally incorporating quantitative and descriptive RNA interference (RNAi) screening data, by enabling the usage of phenotype ontology terms and by providing information on assays and cell lines. We envision that integration of classical phenotypes with high-throughput data will bring new momentum and insights to our understanding. Modern analysis tools under development may help exploiting this wealth of information to transform it into knowledge and, eventually, into novel therapeutic approaches.
The analysis of high-throughput screening data sets is an expanding field in bioinformatics. High-throughput screens by RNAi generate large primary data sets which need to be analyzed and annotated to identify relevant phenotypic hits. Large-scale RNAi screens are frequently used to identify novel factors that influence a broad range of cellular processes, including signaling pathway activity, cell proliferation, and host cell infection. Here, we present a web-based application utility for the end-to-end analysis of large cell-based screening experiments by cellHTS2.
The software guides the user through the configuration steps that are required for the analysis of single or multi-channel experiments. The web-application provides options for various standardization and normalization methods, annotation of data sets and a comprehensive HTML report of the screening data analysis, including a ranked hit list. Sessions can be saved and restored for later re-analysis. The web frontend for the cellHTS2 R/Bioconductor package interacts with it through an R-server implementation that enables highly parallel analysis of screening data sets. web cellHTS2 further provides a file import and configuration module for common file formats.
The implemented web-application facilitates the analysis of high-throughput data sets and provides a user-friendly interface. web cellHTS2 is accessible online at http://web-cellHTS2.dkfz.de. A standalone version as a virtual appliance and source code for platforms supporting Java 1.5.0 can be downloaded from the web cellHTS2 page. web cellHTS2 is freely distributed under GPL.
RNA interference (RNAi) is a well-conserved mechanism that uses small noncoding RNAs to silence gene expression posttranscriptionally. Gene regulation by RNAi is now recognized as one of the major regulatory pathways in eukaryotic cells. Although the main components of the RNAi/miRNA pathway have been identified, the molecular mechanisms regulating the activity of the RNAi/miRNA pathway have only begun to emerge within the last couple of years. Recently, high-throughput reporter assays to monitor the activity of the RNAi/miRNA pathway have been developed and used for proof-of-concept pilot screens. Both inhibitors and activators of the RNAi/miRNA pathway have been found. Although still in its infancy, a chemical biology approach using high-throughput chemical screens should open up a new avenue for dissecting the RNAi/miRNA pathway, as well as developing novel RNAi- or miRNA-based therapeutic interventions.
Fluorescence microscopy is one of the most powerful tools to investigate complex cellular processes such as cell division, cell motility, or intracellular trafficking. The availability of RNA interference (RNAi) technology and automated microscopy has opened the possibility to perform cellular imaging in functional genomics and other large-scale applications. Although imaging often dramatically increases the content of a screening assay, it poses new challenges to achieve accurate quantitative annotation and therefore needs to be carefully adjusted to the specific needs of individual screening applications. In this review, we discuss principles of assay design, large-scale RNAi, microscope automation, and computational data analysis. We highlight strategies for imaging-based RNAi screening adapted to different library and assay designs.
The ability of embryonic stem (ES) cells to generate any of the around 220 cell types of the adult body has fascinated scientists ever since their discovery. The capacity to re-program fully differentiated cells into induced pluripotent stem (iPS) cells has further stimulated the interest in ES cell research. Fueled by this interest, intense research has provided new insights into the biology of ES cells in the recent past. The development of large-scale and high throughput RNAi technologies has made it possible to sample the role of every gene in maintaining ES cell identity. Here, we review the RNAi screens performed in ES cells to date and discuss the challenges associated with these large-scale experiments. Furthermore, we provide a perspective on how to streamline the molecular characterization following the initial phenotypic description utilizing bacterial artificial chromosome (BAC) transgenesis.
RNA interference; siRNA; shRNA; esiRNA; Genome-wide screen; Bacterial artificial chromosome; TransgeneOmics
Systems biology aims to describe the complex interplays between cellular building blocks which, in their concurrence, give rise to the emergent properties observed in cellular behaviors and responses. This approach tries to determine the molecular players and the architectural principles of their interactions within the genetic networks that control certain biological processes. Large-scale loss-of-function screens, applicable in various different model systems, have begun to systematically interrogate entire genomes to identify the genes that contribute to a certain cellular response. In particular, RNA interference (RNAi)-based high-throughput screens have been instrumental in determining the composition of regulatory systems and paired with integrative data analyses have begun to delineate the genetic networks that control cell biological and developmental processes. Through the creation of tools for both, in vitro and in vivo genome-wide RNAi screens, Drosophila melanogaster has emerged as one of the key model organisms in systems biology research and over the last years has massively contributed to and hence shaped this discipline.
Large-scale RNAi-based screens are playing a critical role in defining sets of genes that regulate specific cellular processes. Numerous screens have been completed and in some cases more than one screen has examined the same cellular process, enabling a direct comparison of the genes identified in separate screens. Surprisingly, the overlap observed between the results of similar screens is low, suggesting that RNAi screens have relatively high levels of false positives, false negatives, or both.
We re-examined genes that were identified in two previous RNAi-based cell cycle screens to identify potential false positives and false negatives. We were able to confirm many of the originally observed phenotypes and to reveal many likely false positives. To identify potential false negatives from the previous screens, we used protein interaction networks to select genes for re-screening. We demonstrate cell cycle phenotypes for a significant number of these genes and show that the protein interaction network is an efficient predictor of new cell cycle regulators. Combining our results with the results of the previous screens identified a group of validated, high-confidence cell cycle/cell survival regulators. Examination of the subset of genes from this group that regulate the G1/S cell cycle transition revealed the presence of multiple members of three structurally related protein complexes: the eukaryotic translation initiation factor 3 (eIF3) complex, the COP9 signalosome, and the proteasome lid. Using a combinatorial RNAi approach, we show that while all three of these complexes are required for Cdk2/Cyclin E activity, the eIF3 complex is specifically required for some other step that limits the G1/S cell cycle transition.
Our results show that false positives and false negatives each play a significant role in the lack of overlap that is observed between similar large-scale RNAi-based screens. Our results also show that protein network data can be used to minimize false negatives and false positives and to more efficiently identify comprehensive sets of regulators for a process. Finally, our data provides a high confidence set of genes that are likely to play key roles in regulating the cell cycle or cell survival.
RNA interference (RNAi) has become a powerful tool for genetic screening in Drosophila. At the Drosophila RNAi Screening Center (DRSC), we are using a library of over 21 000 double-stranded RNAs targeting known and predicted genes in Drosophila. This library is available for the use of visiting scientists wishing to perform full-genome RNAi screens. The data generated from these screens are collected in the DRSC database () in a flexible format for the convenience of the scientist and for archiving data. The long-term goal of this database is to provide annotations for as many of the uncharacterized genes in Drosophila as possible. Data from published screens are available to the public through a highly configurable interface that allows detailed examination of the data and provides access to a number of other databases and bioinformatics tools.
Automated, image based high-content screening is a fundamental tool for discovery in biological science. Modern robotic fluorescence microscopes are able to capture thousands of images from massively parallel experiments such as RNA interference (RNAi) or small-molecule screens. As such, efficient computational methods are required for automatic cellular phenotype identification capable of dealing with large image data sets. In this paper we investigated an efficient method for the extraction of quantitative features from images by combining second order statistics, or Haralick features, with curvelet transform. A random subspace based classifier ensemble with multiple layer perceptron (MLP) as the base classifier was then exploited for classification. Haralick features estimate image properties related to second-order statistics based on the grey level co-occurrence matrix (GLCM), which has been extensively used for various image processing applications. The curvelet transform has a more sparse representation of the image than wavelet, thus offering a description with higher time frequency resolution and high degree of directionality and anisotropy, which is particularly appropriate for many images rich with edges and curves. A combined feature description from Haralick feature and curvelet transform can further increase the accuracy of classification by taking their complementary information. We then investigate the applicability of the random subspace (RS) ensemble method for phenotype classification based on microscopy images. A base classifier is trained with a RS sampled subset of the original feature set and the ensemble assigns a class label by majority voting.
Experimental results on the phenotype recognition from three benchmarking image sets including HeLa, CHO and RNAi show the effectiveness of the proposed approach. The combined feature is better than any individual one in the classification accuracy. The ensemble model produces better classification performance compared to the component neural networks trained. For the three images sets HeLa, CHO and RNAi, the Random Subspace Ensembles offers the classification rates 91.20%, 98.86% and 91.03% respectively, which compares sharply with the published result 84%, 93% and 82% from a multi-purpose image classifier WND-CHARM which applied wavelet transforms and other feature extraction methods. We investigated the problem of estimation of ensemble parameters and found that satisfactory performance improvement could be brought by a relative medium dimensionality of feature subsets and small ensemble size.
The characteristics of curvelet transform of being multiscale and multidirectional suit the description of microscopy images very well. It is empirically demonstrated that the curvelet-based feature is clearly preferred to wavelet-based feature for bioimage descriptions. The random subspace ensemble of MLPs is much better than a number of commonly applied multi-class classifiers in the investigated application of phenotype recognition.
FlyRNAi (http://www.flyrnai.org), the database and website of the Drosophila RNAi Screening Center (DRSC) at Harvard Medical School, serves a dual role, tracking both production of reagents for RNA interference (RNAi) screening in Drosophila cells and RNAi screen results. The database and website is used as a platform for community availability of protocols, tools, and other resources useful to researchers planning, conducting, analyzing or interpreting the results of Drosophila RNAi screens. Based on our own experience and user feedback, we have made several changes. Specifically, we have restructured the database to accommodate new types of reagents; added information about new RNAi libraries and other reagents; updated the user interface and website; and added new tools of use to the Drosophila community and others. Overall, the result is a more useful, flexible and comprehensive website and database.