|Home | About | Journals | Submit | Contact Us | Français|
Mapping quantitative cell traits (QCT) to underlying molecular defects is a central challenge in cancer research because heterogeneity at all biological scales, from genes to cells to populations, is recognized as the main driver of cancer progression and treatment resistance. A major roadblock to a multiscale framework linking cell to signaling to genetic cancer heterogeneity is the dearth of large-scale, single-cell data on QCT-such as proliferation, death sensitivity, motility, metabolism, and other hallmarks of cancer. High-volume single-cell data can be used to represent cell-to-cell genetic and nongenetic QCT variability in cancer cell populations as averages, distributions, and statistical subpopulations. By matching the abundance of available data on cancer genetic and molecular variability, QCT data should enable quantitative mapping of phenotype to genotype in cancer. This challenge is being met by high-content automated microscopy (HCAM), based on the convergence of several technologies including computerized microscopy, image processing, computation, and heterogeneity science. In this chapter, we describe an HCAM workflow that can be set up in a medium size interdisciplinary laboratory, and its application to produce high-throughput QCT data for cancer cell motility and proliferation. This type of data is ideally suited to populate cell-scale computational and mathematical models of cancer progression for quantitatively and predictively evaluating cancer drug discovery and treatment.
Cancer cells both across and within individual patients are heterogeneous with respect to genetic (and epigenetic) makeup (Heng et al., 2009). Furthermore, it is increasingly appreciated that even within genetically identical clonal cell populations, individual cells may differ from each other in phenotypic traits (Brock et al., 2009; Stockholm et al., 2007). Genetic and nongenetic heterogeneity remain a formidable challenge for cancer treatment, especially in the case of molecular targeted drugs. Large-scale genetic and epigenetic analyses of cancer variability have begun to extract common patterns at the molecular scale within this morass of heterogeneity.
More recently, powerful cell phenotype analytical methods are coming on line, mostly due to the convergence of image processing, computer-driven automation and high-throughput microscopes (Dove, 2003; Evans and Matsudaira, 2007; Perlman et al., 2004; Starkuviene and Pepperkok, 2007). These methods, termed for convenience high-content automated microscopy (HCAM), enable large-scale analyses of cancer cell phenotype variability that will eventually match the scope of genetic variability analyses.
In this chapter, we describe our implementation of HCAM methods in order to quantify cell traits such as proliferation and motility, and their variability within a cell population such as a cancer cell line. To be clear, we refer to a quantitative cell trait (QCT) as a cell-scale functional property (e.g., proliferation, motility, metabolism, death sensitivity) that displays cell-to-cell variability in a cell population, with respect to some quantitative metric. It is highly desirable that a QCT be defined in numeric terms for machine compatibility, since it is virtually impossible to intuitively deal with, follow in time, or predict consequences of QCT combinations, for example, in cancer progression or drug response. In these early days, these metrics are not firmly established and undoubtedly at some point will have to be agreed upon, particularly for comparing data from different sources in an automated fashion.
In this chapter, our primary goal was to describe methods that define QCT heterogeneity quantitatively, regardless of the source of heterogeneity (e.g., genetic, epigenetic, nongenetic). However, we recognize that, once metrics are established, investigating the source of QCT variability becomes a tantalizing priority. Such investigations may span from identification of molecular mechanisms responsible for generating or dampening QCT variability, to mathematical or statistical modeling for best guess of the type of heterogeneity source (e.g., genetic or nongenetic). Consequences are very practical, because in the case of genetic sources one would expect permanent inheritance of the heterogeneity source, whereas a nongenetic source would produce temporary inheritance of heterogeneity.
Heterogeneity is a central feature of cancer that occurs at all biological scales, from genes to cells to populations. For decades, it has been suspected to be largely responsible for cancer progression and resistance to treatment, spawning intense studies especially at the genetic and molecular level. For example, panels of cancer cell lines have been subjected to genomic, gene expression or protein array analyses, evidencing a large number of genetic mutations, and signaling network alterations associated with malignant transformation. While these studies have been enormously informative and have taken our understanding of cancer to incredible depth, they have suffered from at least two limitations: (i) high-throughput genetic and biochemical studies are generally impractical at the single-cell level and are mostly based on average measurements of a test cell population; and (ii) genotype to phenotype mapping in a single cell, that is, linking genetic or molecular changes with phenotypic trait output (like motility or proliferation) remains challenging. These limitations are especially frustrating in the context of cancer progression, which is paced by the appearance of cell clones abnormal with respect to “hallmark” traits, such that cancer may be referred to as a disease of outliers. Methods to map QCT to underlying molecular defects would effectively produce a multiscale bridge from cancer genetics to cancer cell biology. In this multiscale framework, treatment and drug discovery could be approached with predictive methods.
Analysis at the single-cell level of QCT variability, regardless of source, is commanding increasing attention due to convergent advances in several disciplines. As a whole, the science of heterogeneity has reached maturity in many fields, such as face recognition, machine learning, and signal processing, producing theory that is applicable to cancer cell biology, as well as a wealth of mathematical and computational tools. Computer-driven microscopes are rapidly being refined and promise to deliver for adherent cells the spectacular advances flow cytometry has produced for cells in suspension. Image processing software and automation have the potential to create automated workflows to capture and analyze the behavior of thousands of single cells under tens of conditions in relatively short experimental times.
This ensemble of technology is commonly referred to as HCAM. Its application to cancer cell biology promises to revolutionize our understanding of cell-to-cell variability with respect to a phenotypic trait, referred to above as QCT. It is perhaps worth noting that QCT studies are gaining traction in fields other than cancer. An emerging view is that phenotypic trait variability is inherent to living systems, in large part as an inevitable consequence of biological “noise” at several steps of intracellular molecular operation (e.g., gene transcription, mRNA translation, protein folding). Furthermore, local microenvironmental conditions may extinguish or amplify this noise and, to an extent, even nongenetic variability is inherited from mother to daughter cells on a temporary basis. Normal cells apply considerable resources to constrain or dampen both genetic and nongenetic variability of their traits, particularly as they become functional components of differentiated tissues. In this sense, variability is a negative factor with respect to homeostasis. However, trait variability may provide options to cells for responding to microenvironmental changes, for example, by pushing operation of that trait to the extremes of a range in order to survive under extreme microenvironmental stress, and perhaps for evolving new strategies. In summary, variability of a phenotypic cell trait can be considered a measure of cell adaptability, as well as of evolvability of underlying biochemical circuitry. From this broader perspective, an intriguing view is that during cancer progression the adaptability of cancer cells to microenvironmental changes is ever-increasing. QCT analysis is a key step to breaking down adaptability into numerical parameters that can be evaluated spatially and temporally by higher scale computational modeling.
Advances in experimental and microscopic technologies have made it possible to gather high-quality high-content images of cells and cellular components at an ever-increasing rate. Development of such state-of-the-art equipment and tools allows investigators to gather spatial (i.e., in x-, y-, and z-axes, covering many fields of view, using multiple wavelengths) and time-resolved (i.e., at rapid intervals, over several days) quantifiers that describe various cell traits (i.e., motility, proliferation) in vitro. Such methodology can also be used to explore heterogeneity of traits of individual cells within cell populations.
HCAM is rapidly becoming the most efficient methodology to measure phenotypic traits of cancer cell populations at the single-cell level. In this section, we describe a streamlined workflow for acquiring and processing HCAM data that we have established for our own group (Fig. 2.1). This process can be divided into the key steps highlighted in Fig. 2.1. For some of these steps, recent comprehensive reviews have appeared and the reader is referred to those after a brief discussion. Other steps are dealt with in detail. In brief, the workflow is enabled by development of an informatics pipeline for images and image-associated metadata. Image features are derived using a suite of existing and newly developed image analysis and computer software tools. Ultimately, the goal of this workflow is to establish an efficient pipeline to store, disseminate, and analyze single-cell data for streamlining the use of data categories, for example, for input into mathematical/computational models (Fig. 2.1).
For simplification, the workflow has been broken down into the following steps (all of which are expanded upon in the following sections):
Measurement of spatially- and time-resolved phenotypic traits involves large-scale data acquisition. In order to examine traits efficiently in many cell lines, in many relevant conditions (e.g., hypoxia, drug treatment), and over time, inclusion of a high-throughput methodology such as HCAM is vital. We have primarily utilized a temperature- and CO2-controlled, automated, spinning-disk confocal microscope, the BD Pathway 855 (BD Biosciences, Rockville, MD) for single-cell phenotypic studies, although many other systems exist that provide similar functions. The Bioimager is capable of imaging an entire 96-well microplate in a single channel and focal plane in 10 min, but also has the flexibility to accommodate many other sample formats. Imaging can also be performed repeatedly with multiple images per well, multiple focal planes (z-sections), and using multiple fluorescent channels (using two light sources and various filters for a variety of fluorophores), making the setup ideal for performance of high-content time-lapse studies. In addition, the machine has a capacity for automated liquid handling allowing for precise control of the duration and volume of compound treatments. Lastly, this machine is versatile with adaptable hardware that is directly integrated into our data management system (described below in detail). Ultimately, increasing the efficiency of data acquisition via implementation of such methodology makes it possible to maximize the amount (and potentially the value) of quantitative data extracted from image-based single-cell studies.
There are several trade-offs that require careful consideration during single-cell studies. First, the speed at which images can be acquired limits the number of wells, surface area covered, number of channels, or z-sections, etc. that can be imaged prior to returning to the starting position for the next sequential round of image acquisition. For instance, the frequency of image acquisition is of critical importance when examining motile cells. In order to automate the identification and tracking of individual cells over time (repeated images), the distance the cell has moved between frames must be kept below a minimal threshold that is dependent on the density of cells imaged. It is computationally more challenging to identify a cell between two sequential images as the number of cells and the distance a cell moves increase. A list of imaging trade-offs is shown in Table 2.1.
Another important consideration is photobleaching and phototoxicity. This is generally not a problem for phase-contrast imaging, but can be a substantial limitation for fluorescence imaging. Nipkow spinning disk confocal imaging is particularly well suited for reducing phototoxicity and photobleaching and has become the method of choice for live-cell imaging (Gräf et al., 2005). However, a limitation of imaging through spinning disks is that z-axis resolution is reduced compared to that of laser scanning confocal imaging. Regardless of whether spinning disks are used, the potential effects of imaging on cellular phenotypes must be considered.
Individual HCAM experiments generate large datasets, commonly exceeding 50 GB in size. Therefore, data management—including storage, retrieval, backup, and processing—is facilitated by incorporation of data into the open microscopy environment (OME; http://www.openmicroscopy.org; Swedlow et al., 2003). This open-source software has been designed specifically to address the challenges of HCAM data and provides a standardized management platform by developing software and data format standards for the storage and manipulation of biological microscopy data (Goldberg et al., 2005; Swedlow et al., 2009). OME has previously been used in a number of biological studies to examine many aspects of cellular behavior (Dikovskaya et al., 2007; Porter et al., 2007). An OME remote objects (OMERO) server is established, which provides access to image data (the binary pixel data) and metadata (i.e., associated information about instrument settings, configurations, annotations). Access to data is enabled through client applications that simply run on a user’s computer. These include light-weight web-based interfaces, which can be accessed from any computer with a standard web browser; Java-based client applications, which provide more functionality than the web interface, but must be installed separately on each client computer; and a full cross-platform API, which provides data accessibility from third-party applications like ImageJ and VisBio. In addition, incorporation of data into OME provides MATLAB bindings to facilitate sophisticated image processing and analysis directly through the OMERO server.
Formatted images classified into datasets are then processed using various image analysis tools/algorithms (e.g., cell tracking, segmentation); we use a combination of some existing and freely available from open sources such as MetaMorph™ (Molecular Devices, Sunnyvale, CA), ImageJ (http://rsbweb.nih.gov/ij/; Rasband, 1997–2006), CellProfiler (http://www.cellprofiler.org; Carpenter et al., 2006), OpenLab (Improvision, Waltham, MA), and others custom-developed in-house, to utilize the Vanderbilt Advanced Computing Center for Research & Education (ACCRE) Cluster for rapid processing of individual wells in parallel. MATLAB and Unix Shell scripts, which are designed to run in a high-throughput mode, facilitate this effort. We will present three specific processing modules that were custom-designed for processing of cell motility (Section 4.1) and proliferation (Section 4.2).
The output from the image processing pipeline is a set of cell parameters and images for visual inspection. Information from each tracked cell can be extracted from raw or processed images or from aggregate data for further data analysis. Typical parameters obtained from each image include cell perimeter, mean pixel intensity, and measures of shape such as eccentricity and solidity. Other measurements first require the identification of individual cells across multiple frames. These QCT include parameters of cell motility (e.g., speed, direction), intermitotic times (IMT), and progeny trees. Once all data are extracted from images, it is saved as a set of CSV files for statistical analysis and a set of images for visual inspection (i.e., quality control).
A variety of analytical and statistical tools are applied to further analyze single-cell data, using a few statistical/mathematical packages including, R (http://www.r-project.org/; free software environment), SPSS (SPSS, Inc., Chicago, IL), and Mathematica (Wolfram Research, Inc., Champaign, IL).
Averages and distributions of data are tested using a combination of parametric and nonparametric statistical tests, as needed. First, normality of data is tested using various statistical tests (e.g., D’ Agostino’s K-squared, Shapiro Wilks W), dependent upon sample size, prior to all further analyses. Given a dataset that classically fits normality, parametric statistics (e.g., Student’s t-test, ANOVA) can be applied to detect significant relationships, and presentation of averages, standard deviation (SD), and standard error (SE) is sufficient to describe the population. However, given nonnormality, slightly more extraneous nonparametric tests must be employed to accurately capture population dynamics (e.g., Wilcoxon signed-rank, Kolmogorov–Smirnov tests). Of note, failure to verify assumptions about data (particularly in the study of population heterogeneity) can lead to unfortunate misinterpretation and wrongful conclusions.
Statistical subpopulation analysis will also employ a number of other classic and adapted methods, which are described at length in Section 3.6.1.
Raw or processed data can be presented in various ways, of which we discuss three: (i) averages, (ii) distributions, or (iii) “statistical subpopulations” (i.e., variability distribution discretized by statistical techniques such as clustering). Each of these categories can then be incorporated into corresponding mathematical models.
We have previously incorporated average data, for various phenotypic traits for a panel of genetically related breast cancer cells into the hybrid-discrete continuum (HDC) mathematical model for parameterization (Anderson et al., 2009). However, with the realization of heterogeneity of cell populations, presentation of a single value (average) is often inadequate for an accurate description. Although SD or SE can sometimes be used to effectively describe the variability of normal (Gaussian) populations, skewed nonnormal datasets rich with outliers and possible subpopulations should not rely on these means. Instead, analysis of population probability distributions and subpopulations via various approaches is preferable in these instances, as described below.
Obtaining single-cell measures using HCAM, in combination with rigorous statistical treatment, allows examination and analysis of large populations of cells (N > 1000) in a fairly efficient manner. Using this type of data acquisition for phenotypic traits, in lieu of population-based metrics, allows for presentation of probability distribution, which describes both the range of possible values that a random variable can attain and the probability that the value is within any subset of that range. This category of measurement is particularly useful for representing the spread or variability (i.e., heterogeneity) of a cell population by depicting the nuances of its data (e.g., non-normality, skewness, kurtosis, outliers), which are lost using simple presentation of averages. By providing such data for parameterization of mathematical and computational models where appropriate, one can model heterogeneity of populations more realistically (in line with experimentation), which may ultimately lead to important insights otherwise overlooked. A specific example of applying these techniques to single-cell motility data is detailed below in Section 4.1.
Using other statistical approaches, raw data for various phenotypic cell traits can also be processed to reveal “subpopulations” present within the greater population being examined. In order to quantify intracell line variability, we discretize the continuous distribution measurements described in the above section into “functional subpopulations,” as previously described (Loo et al., 2007; Perlman et al., 2004; Slack et al., 2008). The advantage of identifying discrete subpopulations is that they can be compared across cell lines and identify common trends in response to perturbations of interest. Specifically, methods can be employed to estimate trait subpopulations using model-fit criteria, such as Bayesian information criterion (BIC) or gap statistics (Fraley and Raftery, 2002). BIC is an approximation of integrated likelihood, according to Eq. (2.1):
vk is the number of independent parameters to be estimated in model Mk. θk is the parameter being estimated, and n is the number of points in the dataset D. This approximation has been shown to be a consistent estimator of density, even when dealing with nonparametric (Roeder and Wasserman, 1995) or noisy data. Expectation maximization (EM) is also used in statistics for finding maximum likelihood estimates (MLE) of parameters with a known number of clusters (Eliason, 1993). Using this method, model-based hierarchical agglomerative clustering is used to compute an approximate maximum for the classification likelihood, following Eq. (2.2), as previously described (Fraley and Raftery, 2002):
n labels a unique classification of each observation, and θg is the parameter estimate for each cluster. By combining the hierarchical agglomerative clustering with both EM and BIC, a robust strategy is developed. The brief outline of this algorithm is as follows: (1) choose maximum number of clusters; (2) perform hierarchical agglomerative clustering to estimate a classification for the data under each model, up to a selected maximum number; (3) compute the EM to determine parameters under each model; and (4) utilize BIC to select which is the most likely model of the data.
Additional statistical techniques (i.e., principal components analysis (PCA) or Gaussian mixed models (GMM)) can then be applied as needed to reduce the dimension of a dataset and to find clusters of cells or sub-populations. Ultimately, these subpopulations are represented as probabilistic mixtures of stereotypes (i.e., phenotypes). As presented previously (Loo et al., 2007; Perlman et al., 2004; Slack et al., 2008), we can summarize the percentages of states within a cancer population as a “subpopulation trait profile”—a simple probability vector whose entries sum to one. This analysis allows us to approximate subpopulations (i.e., heterogeneity) that exist within a cell line population with respect to a specific cell trait. Further, we can also use this approach to examine whether specific microenvironmental perturbations (e.g., hypoxia, drug treatment) influence/induce apparent patterns of heterogeneity of cancer cell populations. This particular approach can be invaluable for explaining shifts of cell populations, or more interestingly cell subpopulations, which is quickly becoming a major field of study in cancer research.
These analyses have all been previously used in different combinations for teasing apart cell subtypes based on various parameters (Loo et al., 2007; Perlman et al., 2004; Slack et al., 2008). In summary, this is just one approach for numerically describing the heterogeneity of cell populations, particularly highlighting outliers, based on any number of relevant traits of interest. Much like flow cytometry separates a cell population from suspension into various subpopulations based on certain assignments (e.g., fluorescent marker, cell size), the coupling of HCAM and rigorous statistical tests can provide a means for separating or grouping of live cells dynamically over time from image-based studies. A specific example of applying these techniques to single-cell proliferation data is detailed below in Section 4.2.
As described above, various experimental and computational tools are being developed to investigate a number of applications relevant to the study of QCT in cancer progression. In this chapter, we have focused on measurement and analysis of two specific phenotypic traits (QCT) of single cells, motility and proliferation, both of which are hallmarks of cancer (Hanahan and Weinberg, 2000). It is well established that both traits are aberrantly regulated during disease progression, and probable that intervention strategies targeting these processes may be useful in clinical treatment. The following sections briefly expand upon the clinical importance of each trait, our chosen methodology for various analyses of each, and the implications of conducting such studies.
Cell motility plays an essential role in many biological systems, but precise quantitative knowledge of the biophysical processes involved in cell migration is limited. It is well established that migration of both epithelial and transformed cancer cells is a complex and dynamic process, which involves changes in cell size, shape, and overall movement (Friedl and Wolf, 2003). Therefore, one can characterize cell motility by quantifying several metrics. This provides opportunities to improve the predictive accuracy of computational and mathematical models by incorporating more numerical parameters. Herein, we present a method for assessing single-cell motility, combining experimental, statistical and computational tools, and apply it to the analysis of the dynamics of “unbiased” single-cell migration in vitro (i.e., undirected, or without addition of chemoattractant). This pipeline for analysis was designed with the intention of examining large numbers of heterogeneous cancer cell populations (i.e., cell lines in vitro). This method improves upon classic methods for studying migration (e.g., Boyden chamber) because it captures single-cell dynamics underlying the heterogeneity of cancer cell populations.
We established protocols for both manual (Harris et al., 2008) and automated (custom-written algorithms) cell tracking of single-cell motility. Manual cell tracking is standard practice (Harris et al., 2008) and not covered in this chapter. Although facilitated by several software packages and image analysis tools (Section 3.3), it is laborious and time-consuming (Harris et al., 2008) and limits the throughput. In the context of HCAM and high-throughput studies, it still has a critical function to validate automated analyses.
Automated high-throughput cell tracking (thousands of cells) presents significant challenges, discussed in the following. Due to low signal-to-noise ratio (low contrast between cell and background), automated cell tracking of digital bright-field or phase-contrast microscopic images is often impractical and error prone. Fluorescent-based imaging has far superior signal-to-noise ratios and the resultant images significantly simplify the process of automated tracking. Therefore, for high-throughput studies in our laboratory (using the BD 855 Pathway), cells are labeled with a nuclear protein (histone H2B) conjugated to monomeric red fluorescent protein (H2BmRFP; Addgene Plasmid 18982) to enable identification of nuclei of individual cells. This protein has been used by many groups for imaging purposes, and to date no significant alterations in cellular function due to its expression have been described. The most efficient method for obtaining pooled populations of cells with stable expression of a transgene is using retroviral-mediated transduction, although any method that produces similar results may be employed alternatively. Cells should be flow-sorted to minimize the number of nonexpressing cells within the populations. Numerous protocols exist for these procedures and will not be covered here further. Once a stable cell line is established in which the fluorescent protein is expressed, various parameters must be compared to the parental strain to ensure no obvious clonal selection has occurred. HCAM assays can then be carried out as follows: (1) Cells are seeded into 96-well microplates (~2000 cells per well), allowed to adhere for ~1 h in the temperature-controlled (37 °C), CO2-controlled chamber of the BD Pathway 855 machine, and washed to remove nonadherent cells from wells prior to tracking. (2) Fluorescent images are then automatically obtained at predetermined intervals for a given period of time (5 min intervals, for ~4 h), controlled by BD Attovision software. Based on the information presented in Table 2.1, imaging parameters are set to enable the automatic tracking of as many cell types and conditions as possible. For the BD Pathway 855, the optimized settings for automated tracking of H2B-labeled MCF10A cells are listed in Table 2.2. Using these image settings, ~240 TIFF images (~1.3 MB each) per well per experiment are generated—approximately 35,000 images comprising ~50 GB of storage space experiment. These images are exported and stored using the various data management strategies previously described in Section 3.2 (OME, ACCRE).
We are developing custom-written algorithms for automated assessment of single-cell motility. These tools are designed to integrate with a number of programs/applications, including MATLAB, CellProfiler, and ImageJ. They also interface with OME and cluster computing (e.g., ACCRE at Vanderbilt). A motility software module we designed (named WG1) imports raw images, thresholds them to obtain binary images, segments the binary blobs into objects (i.e., cell, nuclei), calculates centroid values, and assembles them into a matrix that is sent to an external tracking algorithm (for bright-field images, an external tensor voting algorithm can also be used to infer missing edges prior to segmentation). Tracks obtained from the external algorithm are then saved and can be used for processing by other modules (described in other sections). Optionally, WG1 can also be used to overlay the detected single-cell outlines and tracks over the original cell images and save the new images to disk. The resulting image stacks can be visually inspected for quality control.
Once individual cells (nuclei) have been identified and tracked by either a manual or computer-driven method, a number of both classic and novel motility-related parameters for each cell and/or population can be extracted (Table 2.3). Some metrics can be performed at the population-level (P), some at the single-cell level (S), and others at both levels. Some of these parameters are described in the following sections.
Cell speed is thought to correlate with cancer invasion (Wells, 2006). There are several previous investigations of single-cell speed (undirected) or velocity (directed) for cancer cell lines, in various microenvironments (Anderson et al., 2009; Hofmann-Wellenhof et al., 1995; Jiao et al., 2008). Single-cell speed obtained from time-lapse image stacks is automatically calculated using the x, y (and z in three-dimensional studies) coordinates obtained from tracking centroids (calculated from cell nuclei outlines) using MATLAB algorithms. We have previously examined cells for time periods ranging from just a few minutes to ~24 h, at various time resolution (30 s to 10 min intervals). It should be noted that experiments should be optimized (e.g., cell type, matrix, surface), as this can contribute to metric accuracy. We have examined single-cell speeds using frequency histograms and scatter-plots overlaid with box-and-whisker plots containing statistics (Fig. 2.2A), particularly to highlight heterogeneity of a dataset and other trends in data (e.g., skewness, kurtosis).
Persistence time (min) is one of the most common measures of cell motility (Dunn and Brown, 1987). This measure assumes cell motion is a persistent random walk (PRW), and combines persistence in direction and speed in calculation. The PRW model can be derived from the Langevin equation (Eq. (2.3)):
This is a stochastic differential equation describing Brownian motion in a potential, resulting in the Ornstein–Uhlenbeck process (Uhlenbeck and Ornstein, 1930) where m is the mass of the particle, v is the velocity vector, x is the position, t is time, is the coefficient of friction and η represents noise of mean zero. An expectation of the model is described by the Furth equation (Eq. (2.4) (Furth, 1920):
This equation describes the expected mean-squared displacement over time, d represents displacement, nd is number of dimensions, S is speed, P is persistence time, and t is time. Motion is initially ballistic (directed), transitioning in time to super-diffusive, and finally to diffusive. The persistence time is the descriptive parameter of the break point in this transition (Codling et al., 2008). Thus, to accurately calculate persistence time, one must observe cells for a long enough time interval for them to transition to the diffusive regime (roughly 3 h for a 10 min persistence time). We have previously calculated persistence times by both the traditional Dunn method (Dunn and Brown, 1987), and the updated Kipper method (Kipper et al., 2007), which reduces standard error of data to fit by approximately 50%, which is shown in Eq. (2.5) where ξ is an estimate of normalized mean-squared displacement.
Kipper also provides a full treatment of systematic errors in measurement of persistence time. An example of graphs containing mean-squared displacement versus time are shown for three cell lines in Fig. 2.2B. We are yet to determine a steadfast trend for persistence time in cell lines we have examined (data not published), as no obvious correlations have emerged (possibly due to the heterogeneity of populations), however we have determined that this metric can shift dramatically upon changing the cells’ microenvironment, which is consistent with previous literature (Kim et al., 2008).
The motile cell fraction is the percentage of motile cells within a given population, as previously described (Kim et al., 2008). In a number of previous studies, we have determined that for many cell line populations, the majority of cells are nonmotile throughout an entire assay. Interestingly, it seems that, as a trend, a small subpopulation of cells is highly motile, and up to order of magnitude greater in measurement (Fig. 2.2C).
This metric has classically been applied to analysis of bacterial motility (Berg and Brown, 1972; Duffy and Ford, 1997). Recently, we analyzed turn-angle distributions of epithelial and cancer cell lines (Potdar et al., 2009). Individual cell trajectories are tracked and turn-angle values taken from each. This method is subject to systematic measurement error, unless appropriate sampling intervals and high-resolution images are selected.
Consider a model system where speed is chosen from an exponential distribution and turn-angle is chosen from a Von Mises (circular-normal) distribution (Eq. (2.6)),1 where r and θ are polar coordinates, λ and κ are shape parameters, I0 is the modified Bessel function of the first kind for the distributions:
Figure 2.3A shows the resulting distribution (λ, κ = 1), where the peak is the location of a cell, the positive x-axis represents the turn angle (equal to 0), and the grid represents the observable pixels. λ is a factor of mean cell speed, observation interval and the pixel width, and the principal factor in experimental configuration. The aliasing occurs in the measurement, because each x, y pair on the grid represents the observable angles (see Fig. 2.3B, for an example of measurement error for Brownian motion). Increasing pixel resolution reduces this error. The best time sampling interval is a tradeoff between being too short whereby a cell does not move as far along the grid (increasing aliasing), versus being too long and greater than the persistence time, whereby the cell’s observable motion is diffusive. Figure 2.3C shows the resulting error from the Von Mises/exponential model with a λ = 0.5, and 37 bins. Computation of this is done by integrating the density in each pixel and sum-binning the density of the measurable angle of the coordinate. This is a correctable error, and the observed bins can be corrected by this ratio. Total measurement error (TME) is quantified by the following Eq. (2.7), where θm is measured angle and θa is the true angle and n is the number of bins:
Figure 2.3D is a graph showing the effect of TME by λ on the x-axis and bin size by the three curves. Note this does not include the potential loss for a sample interval around or above the persistence time. Further, it is important to note that quantifying these metrics based on cell centroids, as opposed to by pixel, improves the accuracy of the data significantly.
Surface area is commonly used in image processing (Alexopoulos et al., 2002; Carpenter et al., 2006), often as an indicator of differentiation, apoptosis, and other biological processes (Mukherjee et al., 2004; Ray and Acton, 2005). This metric simply quantifies overall cell size (in pixels). In previous studies (Harris et al., 2008), we have designed custom-written MATLAB algorithms to obtain single-cell surface area measurements of cancer cells. As with single-cell speed, this metric can be represented at both the individual and population (average) levels. Overall cell size can be assessed from bright-field or fluorescence images, and subcellular compartments (e.g., nuclei, mitochondria) can also be measured given appropriate use of markers.
One of the main assumptions of the PRW model is that cells are always in motion. However, we have determined that cells do not necessarily meet these criteria, and instead typically pause frequently as they migrate. In order to refine the model to incorporate this idea, we have developed a few novel metrics, each described below in detail, that quantitate this phenomenon in various ways.
Individual cells do not typically maintain constant speed during the course of a time-lapse movie. Instead, their activity is often composed of frames of fast movement, slower movement, and no movement. We have implemented a metric to capture this behavior, termed speed fluctuation (Fig. 2.4A). For non-Gaussian datasets, this metric is calculated using bootstrapping to obtain the range of 95% confidence intervals (CI) of the SD of cell speed for each individual cell in a population. In summary, a number of our previous studies have determined that single-cell speed over time is largely variable and that cells within a population exhibit large amounts of heterogeneity in terms of fluctuation (unpublished data). Further, we have also found that distinct cell lines exhibit contrasting trends in fluctuation—with some remaining fairly constant and others fluctuating dramatically—and that introduction of various microenvironmental conditions can cause dampening or increases in fluctuation for cells (unpublished data). For normally distributed data, presentation of SD or the interquartile range can convey a similar metric.
To accurately add cell pausing into migration models, it is necessary to experimentally determine the distance a cell travels between consecutive pauses. Step-length, flight length, and flight time are three metrics that are used in ecology to study foraging behavior of birds, bees, and mammals (Gautestad and Mysterud, 2005; Viswanathan et al., 1999). The term step-length has also been used to describe the movement of molecular motors on polymers (Wallin et al., 2007). All three terms are used to quantitate distance or time between pauses in motion, but to our knowledge, this metric has not been used previously to quantify the motion of epithelial cells. To obtain step-length, we measured the overall distance traveled between cell pauses in a time-lapse movie using x, y coordinates obtained from cell tracking (defined by two consecutive frames at the same coordinate) and discarded all step-lengths below our tracking error threshold (lengths < 1 μm). Sample step-lengths are shown in Fig. 2.4B. Interestingly, we observe that, just as single-cell speed fluctuates across time and within a population, cell step-length is also highly variable both within and across cell lines.
Persistence and diffusion coefficients are often used to describe cellular motion. However, both of these representations make a number of assumptions about cellular behavior. In particular, they assume all cells are in motion at all times. The IMF was developed to test this assumption, and to provide an additional metric to monitor differences in migration characteristics between cell lines and in various conditions. It measures the percentage of motile cells (must move more than 1 pixel, our measurement error threshold) within a given population at any given time (frame) of a time-lapse movie. In contrast to the motile cell fraction metric, which shows percentage of cells that are “successful” in their migration, this metric represents the ratio of cells “attempting” to move. Figure 2.4C shows an example of applying this metric to MCF10A, AT1, and CA1d cell lines in normal tissue culture conditions; quite clearly these cell lines exhibit heterogeneous expression of motility at any given moment (at 5 min intervals).
Kymography is one method used to gain insight into the specific mechanisms of cell movement by studying morphological changes in shape and size (Bryce et al., 2005; Cai et al., 2007). However, kymography is used for relatively small sample sizes (due to highly magnified images required), during relatively short periods of time (Bear et al., 2002; Cai et al., 2007). We have developed a novel metric, termed DECCA, which represents the overall change in cell area and motion over time (Harris et al., 2008). We previously developed this novel metric to quantify the difference between a completely nonmotile cell (velocity = 0) and a nonmotile cell (also with a velocity = 0, and of the exact same size) that ruffles its lamellipodia, a classic behavior of cancer cells during migration. Figure 2.4D includes a sample of how this metric captures dynamic behavior, adapted from our previous work (Harris et al., 2008).
Time-lapse microscopy images of cell motility can be used to extract all or some of the metrics described above, which can subsequently be used to generate computational simulations (Windrose plots) that combine the various parameters into a single visual depiction of motility. Samples simulations for each of the cell lines presented above in normal tissue culture conditions can be viewed at http://vicbc.vanderbilt.edu/itumor/cell.
Each motility metric demonstrates heterogeneity in a cell population and can be used to investigate relevant differences between normal and cancer cells. However, the reason for using many motility metrics is that each metric by itself is insufficient for defining statistical subpopulations. Defining statistical sub-populations facilitates examining relationships between distinct QCT (e.g., defining how proliferation subpopulations relate to motility subpopulations within a cell population). In the case of motility, cluster analysis methods of BIC and EM (as described in Section 3.5, and applied to proliferation QCT, Section 4.2.4) are applicable, as long as multiple parameters are combined.
Typical studies of proliferation in cultured cell lines involve counting cells (either directly or indirectly) in a population over time. These results are usually presented as a population doubling time (DT) calculated from the number of cells identified at various intervals or as a percentage of the population in each phase of the cell cycle (G1, S, or G2/M) at a given point in time (usually using flow cytometry). These population-level assays are generally limited by the fact that, as endpoint assays, they require large numbers of samples to provide accurate information. This limitation is alleviated by continual monitoring/sampling of cells within a population over time. Nonadherent cells can be sampled with relative ease without disrupting their normal culture. However, for adherent epithelial cell lines, this requires microscopic visualization. The use of time-lapse video of transmitted light microscopy for continual visualization of cells over many days has been used for decades. However, due to the low signal-to-noise ratio (low contrast) between cells background, previously described in the motility application above (Section 4.1.1), automated cell counting of digital light microscopic images remains a challenge. Therefore, we have moved to fluorescent-based imaging to facilitate automated tracking.
As for motility studies, we utilize flow-sorted cells with stable expression of histone H2BmRFP for proliferation studies. Prior to examination of cells at the single cell level, it is important to ensure no obvious clonal selection has occurred during the generation of the modified cells. To do this the resultant population must be compared to the parental cell line. This procedure is easily accomplished using HCAM and comparing to other population-level assays—manual counting being the gold standard. By imaging the cells every 1–4 h and using automatic segmentation algorithms to quantify cell numbers, population doubling times can be calculated by simple linear regression of the natural log of the number of cells in each image. An example of the verification of the similarity of H2B-labeled cells with parental cells is shown in Fig. 2.5A.
Once the population-level proliferation rates have been validated for a particular fluorescent protein-labeled cell line, further investigation of proliferation metrics at the single-cell level can proceed. Based on the information presented in Table 2.1, imaging parameters are set to enable the automatic tracking of as many cell types and conditions as possible. The optimized setting for automatic tracking of H2B-labeled MCF10A cells with the BD Pathway 855 imager are listed in Table 2.2. Using these imaging settings, approximately 240 TIFF images (~1.3 MB each) per well per day are generated––approximately 35,000 images comprising ~50 GB of storage space per 96 h experiment.
The automated analysis of HCAM-generated images can be used to determine IMT (time between mitotic events) of individual cells within a cell population if image acquisition is sufficiently frequent to allow for automatic tracking of cells over time (~6–12 frames/h). In addition, the tracking algorithm described for motility has been modified to include the ability to detect mitotic events and associate resultant progeny with their parental cell. The first software module is the same as used for motility (WG1). The output of this module is a set of MATLAB label matrices and a list of cell centroids at each time step, which can be used for processing by two other modules for obtaining additional proliferation metrics.
The second module (WG2) uses the track ID and shape parameters from the label matrices to extract parameters. To determine cell division events, this algorithm identifies tracked cell IDs that were not present in a previous frame of a time-lapse movie. In order to separate true mitotic events from cells entering the frame, cells that have been moving too fast and were lost by the tracking toolbox and cells whose fluorescence intensity is fluctuating above and below the foreground intensity threshold, which disappear from certain frames and suddenly appear in other frames. We use the collapse in size of the cell nuclei and proximity of the nuclei in anaphase as markers of a true mitotic event. Filters in the algorithm reject new cells that are too far from other cells or have too great an area as possible mitotic events. An additional filter checks the size of possible parents and compares it with the size of the presumptive daughter cells. If the size ratio of parent area to areas of possible daughter cells is too small the event is rejected. Finally, if the size of the two possible daughter nuclei is too dissimilar, the event is also rejected. After the mitotic events are detected, new IDs are assigned to the daughter cells and each cell receives a parent ID. Cells that have entered the frame and cells that were present at the beginning of the movie receive a parent ID equal to zero.
In the last module (WG3), proliferation information, as well as centroid position and shape parameters (e.g., area, eccentricity), are saved to a set of comma-separated text files. In addition, images are generated with the detected nuclei boundaries (or cytoplasm in bright-field images) color-coded based on generation number and cell ID overlaid onto the original image and saved as JPEG files to facilitate manual validation of the automatic segmentation and tracking.
Single cell IMT define the duration of each individual cell lifetime or cell cycle. The generation rate (GR) is calculated as LN(2)/IMT and is used instead of IMT, since its distribution has been shown to be normal (Gaussian) in several noncancerous cell lines. However, the distribution of GR of all cells in a population is overrepresented by the faster dividing cells, which generates platykurtotic (tall and narrow) distributions (Sisken and Morasca, 1965). To reduce this bias, only a single generation is analyzed. An example of the distribution of IMT and GR from multiple generations or a single generation is demonstrated in Fig. 2.5B.
It is important to compare the single-cell GR with population-level metrics (i.e., population DT) since population-level data is comprised the single-cell metrics. For example, under conditions where the population proliferation rate is nonlinear, calculation of a population DT is inappropriate as it is changing over time (Fig. 2.6A, Condition 2), whereas the population DT is calculated as 16.91 h under normal culture conditions (Fig. 2.6A, Condition 1), corresponding to a GR of 0.041––the slope of the line. The population-level proliferation curve in Condition 2 suggests a decreased IMT of the cells over time. Linear regression of data plotted with single-cell IMT on the x-axis and cell birth time on the y-axis provides a tool to examine whether the IMT is time dependent. The horizontal line in Fig. 2.6C, Condition 1, indicates no correlation between birth time and IMT, whereas there is a clear positive correlation of IMT with birth time in Condition 2, indicating that cell cycle times are increasing over the course of the experiment. This type of analysis is not limited to birth time and, therefore, provides a useful general approach for detecting parameter interdependencies.
The image processing algorithms described above in Section 4.2.3 provide a method to link individual cell data to its parent and progeny to generate a family (progeny) tree of dependent data. Each progeny tree represents a clonal population with unknown dependence to other progeny trees, such that progeny trees may be related to varying degrees or unrelated. One metric that can be obtained using data pulled from entire progeny trees is and maximum likelihood estimate of GR for each using the following Eq. (2.8):
Bt and Dt are the number of mitotic events and the number of deaths, respectively and St is the total lifetime of the population. St is obtained by summing the lifespan of each cell within a progeny tree (Keiding and Lauritzen, 1978). In the absence of detectable death, the equation is reduced to simply (Eq. (2.9)):
Since the estimate of GR for the progeny tree is based on the population lifetime (St) and the number of mitotic events (Bt) occurring within a progeny tree, these values can be calculated even for progeny trees containing a single mitotic event (one parent and two offspring). In addition, St is calculated using all cells in each tree, regardless of whether it leaves the frame or persists to the end of the experiment. Thus, deriving GR from progeny trees provides a system with which to compare the proliferation rates of clonal subpopulations within the context of a potentially heterogeneous population, without requiring individual clones to be isolated. Thus, this analysis introduces potential for high-throughput comparison of multiple genetically stable clonal populations and should be able to detect preexisting or frequently occurring stable genetic alterations that alter the proliferative capacity of the cells within the population as a whole. A representative plot of progeny tree GR and the relationship to the other metrics are shown in Fig. 2.6.
Another proliferation metric that can be used to detect variability within each cell line is the similarity of IMT or GR between sibling pairs (or other members within a progeny tree). Since each sibling pair is presumably genetically identical, differences between them can be considered nongenetic. Metrics of this similarity or differences between siblings are obtained either by determining the correlation between sibling GR (Fig. 2.7A and B) or by plotting the difference between the IMT of sibling pairs (Fig. 2.7C).
Although not yet applied to our datasets, a very promising approach to quantify the variance of proliferation metrics within cell lines is the bifurcating autoregression model (Staude et al., 1997). The model accounts for cells progressing through a standard cell cycle and can be used to quantify heterogeneity in the population using bifurcating data structures such as progeny trees. The model provides quantitative values of mean and variance in the population and can quantify the variance of metrics between related members of a progeny tree (e.g., mothers and daughters or sibling pairs).
Other standard assays of DNA synthesis (e.g., bromodeoxyuridine (BrdU) incorporation) and DNA content (e.g., incorporation of fluorescent DNA-binding dyes such as 4′,6-diamidino-2-phenylindole (DAPI) or Hoescht 33342) can easily be incorporated into the HCAM experiments. These assays can be performed in situ to produce results similar to those obtainable using flow cytometry. However, a live-cell, fluorescent, ubiquitination-based cell cycle indicator, “Fucci” system (Sakaue-Sawano et al., 2008) now makes it possible to track the cell cycle of individual cells over time. The Fucci system uses two fluorescent protein-conjugated protein fragments that are rapidly degraded upon ubiquitylation with different fluorescent properties for each phase (G1/S and G2/M) of the cell cycle (Sakaue-Sawano et al., 2008). Data generated by these approaches can easily be integrated with the other proliferation metrics to provide a more complete picture of the cell cycle times of individual cells in the population over time. A list of proliferation, such as IMT, is shown in Table 2.4.
For verification of automated tracking results, random wells (fields of view) are selected for manual verification. The manually derived results of these fields are subjected to the same analysis, and the results are compared for accuracy with the automated results to determine the error rate of the automated process (e.g., histograms of the mitotic times are compared with the two-sample Kolmogorov–Smirnov test for significant differences.)
From this chapter, it is hopefully evident that QCT studies by HCAM can address fundamental questions in cancer, including: (i) defining the relation between progression of cancer cell aggressiveness and QCT variability in a tumor; (ii) determining whether QCT variability range tracks with tumor response to drugs and drug combinations; and (iii) relating QCT variability to the rise of cancer resistance to treatment. It is also expected that these quantitative analyses will have a profound impact on computational and mathematical modeling of cancer progression and treatment, by complementing the plethora of molecular data with an abundance of much needed cellular data.
We thank Dr. Jerome Jourquin for incorporating motility movies into http://vicbc.vanderbilt.edu/itumor/cell. Support for this work was provided by NCI grant U54CA113007.
1The extra λ normalization term is due to the polar form of the Jacobian.