Phenotypic assays of intact organisms allow the study of biological pathways and diseases that cannot be reduced to biochemical or cell-based assays. C. elegans
is a useful model system for studying biological processes shared with humans1
, high-throughput instrumentation and reagent libraries exist for sample preparation and imaging2
, and deviations from wild-type are often readily apparent because worms are visually transparent and follow a stereotypic developmental pattern3
. Large-scale chemical and RNAi screens using C. elegans
and can probe complex processes such as metabolism, infection, and behavior, but so far the analysis of such experiments has largely been manual, subjective, and onerous.
Much progress has been made in automating the analysis of particular types of C. elegans
experiments, such as those involving low-throughput, high-resolution, 3-D, or time-lapse images, or images of embryos7–11
. However, there is still a strong need to automate the analysis of high-throughput, static images of adult worms in liquid culture, a common screening output. For most assays, the density of worms per microplate well causes the worms to touch or cluster, so that automated analysis has been limited to population-averaged measurements12–13
, hiding population heterogeneity and prohibiting measurements on individual animals.
An alternative to microscopy is flow systems adapted for worms(e.g., COPAS, Union Biometrica), measuring length, optical density and fluorescence emission at transverse slices along the length of individual worms. However, image-based screens have several benefits: They allow detection of more complex phenotypes by two-dimensional analysis of shape and signal patterns, and do not require re-suspension of worms in additional liquid prior to analysis, allowing smaller sample volumes and closed culture conditions —an important factor when screening large libraries of small molecules and RNAi clones, and when using pathogenic microbes. Also, image based screening allows for visual confirmation of results, the images form a permanent record that can be re-screened for additional phenotypes, and low-throughput experiments require no more equipment than a microscope and a digital camera.
To improve C. elegans phenotype scoring from images of adult worms in liquid, we developed an image-analysis toolbox that can detect individual worms regardless of crossing or clustering. It can measure hundreds of phenotypes related to shape, biomarker intensity, and staining pattern in relation to the anatomy of the animals.
A typical workflow starts with bright field images(). We pre-process to compensate illumination variations, detect well edges, and make the image binary (). The next step, and the major challenge, is “untangling,” i.e., detecting individual worms among clustered worms and debris. To address this, we first construct a model of the variability in worm size and shape from a representative set of training worms(). The model is then used to untangle and identify individual worms(). A large number of measurements such as size, shape, intensity, texture, and spot counts can thereafter be made on a per-worm basis using all image channels available, as is common for cell-based assays14
. Many phenotypes, such as spot area per animal, can be scored directly by such measurements; more complex phenotypes, such as subtle or complex changes in protein expression patterns, can be scored using a combination of measurements and machine learning15
. If reporter signal location is of interest, we map each worm to a low-resolution atlas allowing quantification correlated to the worm’s anatomy.
Figure 1 Workflow and performance of the WormToolbox. (a) Starting from a bright field image we (b) remove variations in illumination and separate objects from background. (c) We create a worm model from non-touching training worms and (d) untangle individual (more ...)
We evaluated the untangling performance using images from our prior work8
, where 15 worms were placed in each well of a 384-well plate. Approximately 1500 worms from 100 wells were manually delineated, revealing that 46% of the worms were clustered or touching other worms (Supplementary Fig. 1
). Compared to manual delineation, 51% of the worms were correctly detected with automated foreground-background segmentation followed by connected component labeling. When applying the untangling algorithms of the WormToolbox the performance increased to 81%, which proved sufficient for the assays presented here. The major source of error was poor image contrast close to well edges; performance improved to 94% when the foreground-background segmentation was manually corrected, decoupling errors caused by untangling from errors in the initial segmentation. We also tested the performance of the untangling in relation to the size of the training set, and found that performance plateaus using a worm model constructed from 50 randomly selected training worms. This means that training can be done on a relatively small number of samples representing the phenotypic variation of a given experiment (Supplementary Fig. 2
We first evaluated the toolbox on data from a different laboratory and imaging system13
. The challenge was to detect individual adult worms that were partly clustered and mixed with eggs and progeny. We trained the worm model on L4 and adult worms only, and observed that untangling improved the accuracy of finding individual adult worms as compared to thresholding and size-sorting alone ( and Supplementary Fig. 3
). The model efficiently excluded smaller larvae (L1, L2, and L3) and eggs, and performance was relatively robust in the presence of up to 6-fold more progeny than adults (Supplementary Fig. 4
). We also evaluated the performance of worm untangling as the number of worms per well increased. Wells contained either L1, L3, or adult worms at increasing concentrations, and we created a separate worm model for each developmental stage. As expected, the performance was higher for the slightly smaller L3 worms as more space between worms leads to less clustering, but untangling became unstable when the worms were so small(L1) that the image resolution only allowed a few pixels per worm (Supplementary Fig. 5 and 6
In the second assay, we evaluated the toolbox for scoring viability, which can be read out as a morphological phenotype in bright field images alone, without the need for a viability stain. Worms in liquid culture tend to be curved and evenly opaque when alive but become rod-shaped and textured when they die (). We untangled high-throughput images of worms infected with Enterococcus faecalis
and either mock-treated with DMSO or treated with ampicillin12
. After making shape, intensity and texture measurements of each untangled worm, we manually selected 150 live and dead training examples from one 384-well plate. We thereafter used the gentle-boosting classifier of CellProfiler Analyst15
(Supplementary Fig. 7
) to identify a combination of measurements that discriminates live and dead worms. Finally, we applied the classifier to 1,500 worms from a different 384-well plate, and verified that it distinguished live and dead worms as well as humans can (). To evaluate the performance of the viability scoring on more heterogeneous data from a real high throughput experiment we selected 1,766 random images and 200 hits from a 37,200 compound screen12
and compared the automated scoring with that of visual scoring based on bright field images (Supplementary Fig. 8
). We achieved an accuracy of 97%, and a precision of 83%, indicating that morphology-based viability screening could be a feasible alternative to the viability stains(SYTOX)used in the original screen.
In the third assay, we evaluated how well the toolbox could differentiate between a positive and a negative control from an RNAi screen for regulators of fat accumulation16
. The positive control down-regulates daf-2
, and the negative control was an empty vector. We compared two different approaches for pattern quantification: per-well measurements(using the basic functionality of CellProfiler), where no effort was made to assign fatty regions to individual worms, yielded a false discovery rate (FDR) of 22.2% (Supplementary Fig. 9
); and per-worm measurements(using the untangling functionality of the Worm Toolbox), yielded an FDR of 4.5% (Supplementary Fig. 10
). The per-worm measurements were superior because they captured the heterogeneity of the population, which was lost in the population averages from per-well measurements.
Finally, we evaluated the toolbox’s ability to detect worms with a change in the location of GFP expression(). We used a C. elegans
strain where GFP expression in the intestine is under the control of a promoter that responds to Staphylococcus aureus
. A pharyngeal stain (mCherry) served as an internal control. The assay could not be scored using simple approaches, such as measuring the total intensity of GFP expression per well or per worm, or counting the number of GFP spots (Supplementary Fig. 11
). However, using worm straightening () and our atlas-mapping (), we were able to quantitatively detect elevated expression of clec-60::GFP
in the anterior intestine () and separate positive and negative controls with a Z’-factor of 0.21. Here we focused on location of signal along the length of the worm, but asymmetric signal distribution across the width of the worm (e.g. fluorescence in full worm as compared to only eggs, or only gut) could also be discerned, using the outline of the worm as a spatial reference for the atlas.
Figure 2 Scoring fluorescence signal distribution with the WormToolbox. (a) The images show clec-60::GFP expression (green) in wild type worms (top) and in pmk-1(km25) mutants (bottom). The pharynx is marked with myo-2::mCherry (red). (b) Automated worm detection (more ...)
The WormToolbox is the first system to automatically, quantitatively, and objectively score a variety of phenotypes in individual C. elegans
in static, high-throughput images. The toolbox is implemented as modules for the open-source CellProfiler14,18
software, emphasizing ease-of-use, is compatible with cluster computing to speed analysis, and is flexible to new assays developed by the scientific community. Training the worm model takes less than an hour, and once an image analysis pipeline is set up for an assay, a typical analysis takes 10–30s per image; much less if a computing cluster is available.
The performance of the WormToolbox depends on the contrast between worms and the surrounding background, making it sensitive to large variations in background illumination and to the worm-like tracks sometimes formed when growing worms on agar medium. The WormToolbox can handle images of worms on agar in large plates, but further optimization is needed for worms on solid medium in 384 well plates. In liquid culture, the untangling can handle up to 20 adult worms per well in 384-well format, and is designed to detect worms of the size and shape range of the training worms used to create the worm model. Unexpected phenotypes are likely to be discarded as debris, but wells with a low fraction of correctly detected worms may be flagged for visual examination. In future work we will extend the WormToolbox by adding further worm-specific measurements based on their unique anatomy and better handling of mixed worms at various stages of development.