Time-lapse imaging of optically sectioned nuclear images has provided an unprecedented opportunity to observe biological processes as they unfold in space and time. Using fluorescent proteins such as GFP to label nuclei, one can image the embryogenesis of diverse organisms such as C. elegans
], zebrafish [3
] and mouse [5
] with single cell resolution over an extended period of time. Given sufficient temporal resolution, individual nuclei can be followed over time, providing a virtually contiguous record of proliferation, differentiation and morphogenesis [2
]. However, exploiting this source of information is not trivial: such data sets can record thousands of cells over hundreds of frames, adding up to terabytes of files. The data becomes manageable only when tracks of the behavior of individual cells are generated from the raw images.
In response to this, a variety of computational techniques and software packages have been developed to aid in quantitative analysis of images [10
]. The largest class of image analysis methods focuses on segmenting contiguous regions of pixels, implicitly detecting nuclear locations in the process. The simplest technique, thresholding image intensity, has been supplemented with image processing techniques like smoothing, adaptive thresholds [16
], morphological operators, mode finding[18
], watershed [20
]and level set methods[22
] to increase robustness in the face of noise, uneven contrast and touching nuclei. A second class of methods uses matched filters to detect the centers of nuclei. A predefined template of object appearance is compared to every location in an image and local maxima of similarity are assumed to be object centers [1
The success of these methods is mixed. When nuclei are widely spaced, many methods perform well, with error rates of one percent or less [16
]. However, as nuclear density increases and nuclei start to touch, which is typical of late embryonic development and adult tissues, error rates rise to from two up to more than ten percent [1
]. Errors are caused by image characteristics that violate the rules or models underlying a detection method. These have their origin in both biology and the imaging process. Variation in nuclear fluorescence and shape is the primary biological complication. Uneven signal within a nucleus (Additional file 1
, Figure S1a) can result in over segmentation, separate redundant detections of a nucleus on each mode of intensity. Elongated and irregular nuclear shapes can also contribute to over segmentation (Additional file 1
, Figure S1c). Even if expression is uniform, each end or bump may independently match some computational definitions of a nucleus. Biology also contributes to under segmentation or missed nuclei. When nuclei are crowded, nuclei with weaker expression may be masked by brighter neighbors and remain undetected (see Additional file 1
, Figure S1b for an illustration of a weaker nucleus).
Imaging distortions and resolution limitations further complicate nuclear detection. An optically sectioned sample yields a series of 2D cross sections, so that nuclei are recorded as slices through bright globular shapes against a dark background. Coordinates within the 2D planes are referred to as x and y while z refers to the direction orthogonal to the image planes. Optical and physical constraints make resolution along the z axis lower than that within the x,y plane. Typically, phototoxicity and image acquisition time also limit the number of optical sections, further reducing the z resolution. In turn, this limits the information available to resolve closely packed nuclei along the z axis (Additional file 1
, Figure S1d illustrates two nuclei whose images merge along the z axis). Systematic distortions such as fading of signal with depth and stretching of fluorescent signal along the light path (Additional file 1
, Figure S1e) further distort the picture, contributing to error.
All errors are problematic because the quality of information extracted during image analysis limits the kinds of biological questions that can be answered. A low percentage of error, say three or five percent, allows reliable assessment of statistical trends in the behaviors of a homogeneous group of cells, such as drug response in cell culture, or the shape and overall migration of a tissue. On the other hand, this level of accuracy is not always sufficient for detailed analysis of embryogenesis, where tracing the behavior of single cells is often necessary. Investigation of many critical developmental processes such as neural crest cell dispersion[27
] or convergent extension [28
] can be most effectively investigated with this kind of single cell record of development. Though desired information may be theoretically present in images, it is large-scale annotation of cell behavior that makes systematic investigation possible. Tracing the paths of cells as they move and divide demands virtually 100% accuracy, as a handful of misidentified cells per time point can lead to a thoroughly fragmented and scrambled lineage [1
]. To make use of error filled data, biologists must spend a significant amount of time manually editing the automatically constructed result. Even for simple organisms with a few hundred cells, such as C. elegans
, this has previously taken up to days [29
]. In an organism like zebrafish this amount of editing would be an impossible task, with correction of even a sublineage of interest being weeks, or months, of effort. Errors quickly make human curation a bottleneck, undermining the premise of high throughput imaging. Often, when combined with time constraints, this inability to accurately find cells forces one to detour around a key question, missing an opportunity for a clean experimental design.
To address this challenge we present a detection and segmentation method that is accurate enough to allow high fidelity analysis over a variety of images while remaining fast enough to run in real time, making it practical for use with large data sets.