High-content screening (HCS) is a powerful high-throughput technology for generating cellular images that are rich in phenotypic information and has been widely used in functional proteomics [1
] and drug profiling [2
]. In HCS, multiple images are collected per microplate well and processed by image analysis algorithms to extract multiparametric cellular measurements on cell growth rate, sizes of cells, localizations of organelles, etc. Nonetheless, rich phenotypic information about complex cellular structure and morphology (e.g., the dynamics and organization of filamentous actin and microtubules, the connection topology between neurons, etc.) are usually left unexplored. These cellular phenotypes are the result of the global activities of the underlying biological networks. To better understand the function of these networks and their components, additional information is required to relate them to observed phenotypes.
RNA interference (RNAi) is a revolutionary method for silencing gene expression and provides causal links between genes and functions through RNAi-induced loss-of-function phenotypes. For example, RNAi-based full-genome morphological profiling has offered deep insights into genes functioning in the first two rounds of cell division in the C. elegans
]. RNAi high-throughput screening (RNAi-HTS) is increasingly used to identify and understand the molecular components and pathways responsible for key cellular processes [4
]. These screens are highly successful as they identify not only most of the known components but also a great number of genes previously unknown to be involved in a given molecular pathway or cellular process. It is widely expected that genome-wide RNAi-HCS will be increasingly used to profile the cellular phenotypic characteristics of genes, which can be combined with other types of biological data (e.g., transcriptional profiling data and protein-protein interaction data) to elucidate both gene functions and structures of biological networks.
illustrated the procedure of a genome-wide RNAi-HCS study that generated the data for this work. The screen was carried out in 384-well plates (). RNAi was achieved by using double strand RNAs (dsRNAs) that specifically target the mRNA products of genes. The GFP-labeled Drosophila primary neuron cells in each well were treated with one dsRNA targeting at a particular gene (i.e., knockdown a gene by a dsRNA). The experiment used about 21,300 dsRNAs that target at all genes in the Drosophila genome. The morphologies of the GFP-labeled cells in each well were imaged on a robotic microscope. Six images were taken for each well. In addition, each experiment has multiple replicates. Hence, each RNAi treatment is associated with a set of images taken from multiple replicates. Each plate has several control wells, where cells were either untreated or subject to the baseline treatments. Images from the control wells are defined as wildtype images (). Some RNAi-treated cells may demonstrate phenotypes. For example, actin dsRNA treated cell cultures show smaller cell clusters with weaker connectivity (). This kind of experiment is very useful to identify genes involving in neuron development or maintenance. Since it is also a new kind of biological experiment, most of the observed phenotypes are novel phenotypes (i.e. uncategorized phenotypes). It is extremely time-consuming to formally define the phenotypes because the number of images is huge and the potential number of phenotypes is large and unknown. There is no existing tool capable of automatically and thoroughly analyzing these kinds of images.
Design of genome-wide RNAi-HCS screen in GFP-labeled Drosophila primary neural cells
As shown in the above example, a typical genome-wide RNAi-HCS screen can easily generate about 100k high-resolution cellular images. However, little is known about the biological factors underlying the observed cellular phenotypes. Existing image analysis tools, such as AcuityXpress™
from BD Biosciences (www.atto.com
from Cellomics (www.cellomics.com
from BioImagene (www.bioimagene.com
), CytoShop from Beckman Coulter, the iCyte®
Cytometric analysis software (www.compucyte.com
), IN Cell Analyzer 1000 (www.amershambiosciences.com
), QED Image (www.mediacy.com
), NIH Image (rsb.info.nih.gov/nih-image
), and UTHSCSA ImageTool (ddsdx.uthscsa.edu/dig/itdesc.html
), perform reasonably well in low-level image analysis, for example, extracting fluorescence readouts, segmenting images into “biological objects”, extracting sizes, shapes, intensity, and texture of image segments, and so on. Nonetheless, they have very limited capacity in modeling, detecting, and recognizing complicated morphological phenotypes, which is a bottleneck of morphological studies using the HCS technology.
The Murphy Lab has pioneered in using supervised approaches to train subcellular pattern recognizer using high-content fluorescence microscopy images [8
]. Nevertheless, classifiers obtained by supervised approaches require annotated images and only recognize phenotypes that they were trained to identify. Hence, a completely supervised approach is impractical when facing a huge collection of unannotated HCS images and a large novel phenotype set, especially, the size of which is unknown.
To bridge the huge gap between the high-level biological concepts (i.e., cellular phenotypes) and the low-level phenotypic profiles, we propose to use Visual data exploration (VDE) to deal with the flood of information. The basic idea of VDE is to present the data in some visual forms that allow users to gain insight into the data and generate hypotheses by directly interacting with the data. The advantage of VDE is that users are directly involved in the data mining process to combine the flexibility, creativity, and general knowledge of the human with the enormous storage capacity and the computational power of computers. This process is especially useful when little is known about the data and the exploration goals are vague, such as in analyzing a huge number of RNAi-HCS images.
However, without effective means to adequately explore large-scale HCS image databases, visual data exploration could be a daunting task. Existing works for visualizing HCS data, for example, Cellomics vHCS™
Discovery ToolBox (www.cellomics.com
) and the method developed in [11
], focus on visualizing simple quantitative readouts of markers instead of images and especially the relationships among images that convey profound information closely related to effects of chemical compounds, gene functions, and biological processes. Previous work on image database visualization [12
] targeted at personal photo albums, which are much smaller than HCS image databases, and did not consider computational needs specific to HCS image analyses. The Open Microscopy Environment (OME) project [14
] provides an excellent open-source browser to navigate an HCS image database [15
] that is described as a quasi-hierarchical structure representing the relationship between projects and datasets. However, this navigation scheme was not designed to facilitate phenotype discovery and categorization, which is one of our main focuses.
To this end, we have developed an application, im
CellPhen - i
ining of cell
), which provides intelligent interfaces for visualizing large-scale RNAi-HCS image databases and interactive mining of cellular phenotypes. im
CellPhen has been used to facilitate the analysis of images generated in the genome-wide RNAi-based morphological screen of Drosophila
primary neural cells (see ). In section 2, we briefly describe the extraction of low-level image features and the derivation of the metadata representation for RNAi treatments. The details of im
CellPhen are explained in section 3. The techniques developed in this study can be applied to study other types of cells images generated by using the RNAi-HCS technology.