Over the last decade, advances in microscope automation, fluorescent labeling, sample preparation, image processing, pattern recognition, and statistical learning have ushered in a new era of cell biology experiments based on unbiased genome-wide cell-based assays. In eukaryotic cell biology, these advances have been exploited to investigate the distribution and patterning of labeled subcellular structures and the “location proteome” defined by the distribution and intensity of localized protein reporters (
Chen et al. 2006). In the prokaryotic world, the challenges surrounding fluorescence microscopy at the diffraction limit has meant a somewhat slower adoption of highly automated methods and analysis designed for large-scale genetic screens or assays. Nevertheless, large-scale approaches are now being used more routinely during investigations of bacterial systems, from the identification of entire location proteomes (
Werner et al. 2009) to the detection of modular transcription and signaling pathways (
Christen and Fero 2009). Because of the size and complexity of the eukaryotic cell, generalized pattern recognition and classification approaches are often crucial. In these approaches, conditioned, parameterized, preclassified image data are used by statistical learning algorithms to build models that capture the differences between classes of images. A properly constructed model can then be used to automatically analyze a previously unclassified experimental data set (
Boland et al. 1998;
Boland and Murphy 1999;
Boland and Murphy 2001;
Huang and Murphy 2004;
Chen and Murphy 2005;
Jones et al. 2009). In prokaryotic cell biology a similar situation exists, with one major difference; the simplicity of the bacterial cell allows us to entertain the idea of a cell parameterization based not only on generalized image properties, but also on specific cell measurements of high biological relevance. For example, instead of measuring hard to interpret properties based on generalized image parameters and a global image transform, such as a Zernike or Fourier transform, one can conceive of measuring a more relevant set of parameters related directly to gene function or to the biology of interest, such the position and quantity of both localized and delocalized fluorescent signals, the length and width of the cell, the degree of pinch at the division plane or sporulation septum, the precise location of the poles or other points of inflection, as well as the location of membrane structures such as pili or stalks and the membrane outline itself. In this way, albeit with a certain degree of additional work, the image can be reduced to a set of high information content parameters. This type of data can be used more efficiently in any type of subsequent analysis, whether direct or via a model-building statistical learning exercise. In addition, reducing the cell to a small set of highly informative parameters allows the option of combining those single-cell measurements into quantitative ensemble-based phenotypes where the entire distribution of measured values is recorded rather than just means and standard deviations. With the addition of genotype information, this data can then be used directly in unsupervised statistical learning approaches such as hierarchical clustering to do quantitative phenotypic profiling (
Ohya et al. 2005). In this article we outline the process of doing such an automated analysis on a bacterial system. To keep our discussion firmly rooted in a practical example, we will use a case-study approach based on work performed in the α-proteobacteria
Caulobacter crescentus. However, most of what we discuss can be applied to other bacteria and single-cell eukaryotes. It should also be noted that the efficacy of a quantitative analysis depends on the constructs used to express fluorescently tagged reporter proteins. Even modest overproduction of some proteins, particularly those involved in signal transduction and cell division, can have deleterious effects on cell viability and on cell shape (
Gregory et al. 2008). We will assume that for the purposes of quantitative analyses, tagged protein reporters are fully functional proteins produced from genes in their native chromosomal context.