|Home | About | Journals | Submit | Contact Us | Français|
A central challenge of biology is to understand how individual cells process information and respond to perturbations. Much of our current knowledge is based on ensemble measurements. However, cell-to-cell differences are always present to some degree in any population of cells, and the ensemble behaviors of a population may not represent the behaviors of any individual cell. Here, we discuss examples of when heterogeneity cannot be ignored, and describe practical strategies for analyzing and interpreting cellular heterogeneity; while ensemble measurements may be too simplistic, capturing all variation among cells may be unnecessary. To develop accurate models of individual cell behavior–be they in the form of cartoons, words, or mathematics–it is essential to identify which cell-cell differences are important and which can be ignored.
After decades, of probing, measuring, and analyzing the behaviors of single cells, it has become clear that the challenge is no longer to demonstrate that populations of “seemingly identical” cells are heterogeneous. Indeed, phenotypic differences among cells are always present at a fine-enough resolution of inspection. Rather, the daunting challenge is to determine which–if any–components of observed cellular heterogeneity serve a biological function or contain meaningful information.
Population-averaged assays are powerful tools in biology, enabling the identification of components and interactions within complex metabolic, signaling, and transcriptional networks. Such measurements1 can succinctly capture population state, and readily report how these states change in response to perturbations. An assumption of such assays is that ensemble averages reflect the dominant biological mechanism operating within individual cells within a population.
What about cells away from the mean (Figure A(i))? The behavior of such cells may be similar to that of the average behavior of the population, and observed variation summarized by a mean–and perhaps a variance–with no loss of meaningful biological information. Biochemical noise (Elowitz et al., 2002; Newman et al., 2006; Ozbudak et al., 2002; Raser and O'Shea, 2004) may give rise to cellular differences that have no functional significance. So-called “housekeeping” genes are often chosen as references in assays under the assumption that their variation in expression is small and/or biologically unimportant (though, such assumptions are increasingly challenged (Bahar et al., 2006)). As a more specific example, subpopulations of R1-R6 photoreceptor cells in the Drosophila compound eye are considered functionally equivalent with respect to their response and adaptation to light signals (Yau and Hardie, 2009). However, biochemical noise can also lead to important functional differences, such as seen for competence decisions in B. subtilis and selection of color vision photoreceptors in D. maelanogaster (Losick and Desplan, 2008). Recently, it was demonstrated that subpopulations of clonally-derived hematopoietic progenitor cells with low or high expression of the stem cell marker Sca-1 can be in dramatically different transcriptional states and give rise to different lineages (Chang et al., 2008). Therefore, models derived from ensemble averages may not represent individual cell function even for a simple bell-shaped distribution of single-cell measurements.
Population distributions can also mask the presence of rare or small subpopulations of cells (Figure A(ii)). In such a case, a population mean may represent the vast majority of cells, yet miss important biology. Recent studies have investigated the presence and dynamics of small subpopulations present within isogenic populations of bacteria. These include the identification of pre-existing subpopulations of slow-growing “persister cells” that have negligible effect on fitness under normal conditions, but enable survival under drug treatment (Balaban et al., 2004). In other studies, variability in the duration of a transiently-differentiated state may increase fitness in fluctuating environments (Cagatay et al., 2009). Similarly, small reservoirs of dormant stem cells have been identified within larger haematopoietic stem cell populations. The rapid reactivation of these subpopulations during times of injury plays a crucial role in re-establishment of homeostasis (Wilson et al., 2008). Finally, cancer is a highly heterogeneous disease (Heppner, 1984; Rubin, 1990). The origins of subpopulations that contribute unequally to disease progression or response to therapeutic intervention are the subject of debate. However, heterogeneity poses practical challenges for building accurate clinical models–particularly ones based on population-averaged measurements–to guide diagnosis and treatment of the disease (Campbell and Polyak, 2007). Even within clonal populations in carefully controlled laboratory conditions, the dynamics and responses of single cancer cells to drugs can vary widely (Cohen et al., 2008; Gascoigne and Taylor, 2008).
An even more problematic situation is when the ensemble-averaged measurement poorly reflects the internal states of the majority of the cells, any subpopulation of cells, or even any single cell (Figure A(iii)). This can occur, for example, when the population contains several dominant, yet phenotypically distinct subpopulations. This situation is observed for populations of immature Xenopus oocytes (Ferrell and Machleder, 1998). Ensemble averages suggest that a graded input of the maturation-inducing hormone progesterone is converted to a graded signaling output, in this case phosphorylation levels of the downstream readout p42 MAPK. Physical models based on these measurements would predict the “average” oocyte would exhibit an intermediate level of commitment for an intermediate level of progesterone stimulation. However, for individual oocytes, commitment is an all-or-nothing response. That is, the population is comprised of two non-overlapping subpopulations–committed and non-committed–with no oocytes at intermediate levels of commitment. Thus, when populations are mixtures of distinct subpopulations, the biological models of relevance are mechanisms operating within each subpopulation.
Signaling networks, even for narrowly-defined biological functions, can be quite complex, and constructing accurate models may require monitoring multiple components (Sachs et al., 2005). New layers of complexity to heterogeneity may be revealed by monitoring multiple readouts simultaneously per cell (Figure A(iv)). At lowest resolution, the state of a population may be approximated by the mean of each readout, such as is routinely performed with western blots or microarrays, though with all of the caveats previously discussed. At an intermediate resolution, the distributions of each readout may be analyzed independently (so-called “univariate” analysis). Though such approaches simplify analysis (Perlman et al., 2004), and may uncover heterogeneity one readout at a time, important relationships among the readouts within individual cells may be missed. At highest resolution, analysis of all cellular readouts from individual cells simultaneously (so-called “multivariate” analysis) may reveal both heterogeneity and important coupling among the readouts. Consider the case of two readouts, whose population distributions are each bimodal (low or high values). In principle, any cell could have one of four possible combinations of (low/high, low/high) values for these two markers. However, analysis of each marker independently would not distinguish, for example, between correlated and anti-correlated expression of the two readouts within individual cells. By contrast, multivariate analysis of the readouts would identify which of the four potential modes were present within any given population. For example, during adipogenesis of the classic 3T3-L1 cell system, readouts such as overall Adiponectin or lipid droplet expression increase from low to high levels suggesting that the “average” cell progresses towards a (high, high) internal state (for (Adiponectin, Lipid droplet)). However, single-cell studies revealed that cells moved from a beginning (low, low) state, to an intermediate (high, low) state, to a final (low, high) state (Loo et al., 2009a). The expected (high, high) cellular state, observed with only small probability, was an illusion of ensemble-averaged measurements. Decomposing heterogeneity was required to develop accurate models of the signaling states of relevance–those occurring in individual adipocytes.
Thus, ensemble measurements can provide accurate descriptions of individual cellular behavior when heterogeneity simply reflects functionally-meaningless fluctuations around a single cellular state. However, when a population is composed of multiple cellular states, ignoring heterogeneity can lead to models that may be accurate at the level of population trends, but do not reflect the signaling states of any individual cell. Hence, the use of ensemble measurements must be justified when moving from descriptive to predictive models.
Heterogeneity is observed for essentially all dimensions of single cell measurements at high resolution. Some well-understood dimensions are routinely used to sort populations into mixtures of cells in distinct and meaningful biological states, such as defined by distinct cell cycle stages or cell types. But what about the multitudes of other dimensions, for which cell-to-cell variation has no obvious or known biological interpretation? In some cases, cellular states can be distinguished by detailed molecular differences. The selection of a specific molecule from a large repertoire of possibilities generates enormous diversity by individual cell variation, such as seen for odorant, cell adhesion, or immune molecules (Coufal et al., 2009; Lomvardas et al., 2006; Morishita and Yagi, 2007; Muotri et al., 2005; Ribich et al., 2006). In other cases, states can be distinguished by large-scale changes, such as signal transduction responses to hormone stimulation (Ferrell and Machleder, 1998), and detailed molecular differences are unimportant–particularly when function is buffered against biochemical fluctuations (Shinar and Feinberg, 2010). A general challenge is to determine which details give rise to biologically-meaningful distinctions within and among populations, and which can be ignored (i.e. when an ensemble average is justified).
Determining whether observed heterogeneity has functional significance requires a framework for quantifying heterogeneity and assessing its information content. An intuitive and tractable approach is to decompose heterogeneous populations into mixtures of simpler, more homogeneous subpopulations, based on the expectation that cells with relatively similar measured states should behave relatively similarly. The biological relevance of such a decomposition could be tested by determining whether different subpopulations, or mixtures of subpopulations, have different functional properties. Decomposition-independent approaches are also possible for testing whether heterogeneity contains information. The assumption is that populations with different distributions have functional differences. For example, mechanisms of drug action can be classified based on comparison of the different distributions of cellular responses that they induce (Perlman et al., 2004). However, informative decompositions can provide useful guides for future studies, including further probing of the molecular states of the subpopulations (Loo et al., 2009a; Loo et al., 2009b). In general, the investigation of heterogeneity requires a combined approach for capturing population statistics from single-cell measurements, identifying patterns of heterogeneity, and testing whether these patterns contain functional information.
Heterogeneity is essentially a statistical properly of cellular populations. A range of cellular behaviors can be estimated from observations of a small number of cells over long times, or a large number of cells at a small number of times. (In theory, if each cell behaved ergodically–allowing averaged behaviors from both observational approaches to be related–these distributions would be equally informative (Brock et al., 2009); however, the ansatz of ergodicity is difficult to test rigorously.) In practice, statistical properties of a cellular population are often estimated from snapshots of large numbers of cells. Single-cell phenotypes be captured by many different measurement technologies (Figure 1B(i)) (Anselmetti, 2009), including flow cytometry, electrophysiology, microscopy, and single cell PCR or sequencing. For some technologies, the choices of cellular features are well defined, while for others the choices are less clear. For example, the total intensity of each labeled biomarker is a standard readout in flow cytometry, while there is virtually no limit to the number of features that can be extracted in microscopy. The desire to extract all relevant information from single-cell assays may require strategies that balance between two, somewhat opposing goals: interpretability and comprehensiveness. Interpretability suggests an intuitive connection between a familiar biological phenotype and the measured values of a feature. Comprehensiveness suggests the ability to capture all important information; large collections of features are measured with the hope that all anticipated (or unanticipated but important) biological variation is captured. In microscopy, strategies striving for interpretability may use expert-selected features, such as cell shape, marker intensity or marker localization, while strategies striving for comprehensiveness may use features derived from general-purpose transformations, such as such as Haralick texture features or Zernike moments that report large numbers of statistical properties of marker staining patterns (Boland and Murphy, 2001). Regardless of the strategy, the outcome is that each individual cell is represented by its collection of extracted features as a point in (high-dimensional) “feature” space, with each axis representing a different measurement. Therefore, populations of cells have been transformed into distributions (of points) in feature space.
The problem of identifying patterns of distinct cellular behaviors in feature space (Figure 2B(ii)) can be reduced to well-studied, analytical and computational problems of decomposing heterogeneous distributions, for which extensive methodology exists and is used in diverse applications ranging from gene function prediction to speech and face recognition (Duda et al., 2001). In supervised approaches, expert knowledge is used to partition the feature space. Approaches range from simple, user-defined thresholds (such as routinely used to separate subpopulations in flow cytometry) to more complex, machine-learning algorithms that iteratively train computers to identify subpopulations based on examples (such as recently developed for high-throughput microscopy screening (Boland and Murphy, 2001; Jones et al., 2009; Ramo et al., 2009; Yin et al., 2008)). In unsupervised approaches, computational algorithms partition distributions in naive, or unbiased, manners. Intuitively, methods such as deterministic K-means clustering or probabilistic Gaussian mixture modeling, seek to identify local, high-density regions of phenotype space (Figure 1B(ii)) (Pyne et al., 2009; Slack et al., 2008). There are always questions of whether cellular behaviors can categorized by a finite number of states, and whether a distribution has been partitioned into too few or too many subpopulations. Analytical criteria (Wang et al., 2007), or even visual inspection may be used to provide guidelines for such questions.
However, the ultimate usefulness of a decomposition is the degree to which it reveals biological information for a given functional readout. One can test whether the behaviors of specific subpopulations are significantly different from each other and/or an ensemble average (Figure 1B(iii, left)). As examples, bacterial subpopulations with different growth rates can have different resistances to antibiotics (Balaban et al., 2004), and adherent eukaryotic subpopulations with different microenvironmental properties can have different functional behaviors for viral infection and endocytosis (Snijder et al., 2009). One can also test whether an entire decomposition of heterogeneity is informative (Figure 1B(iii, right)). This case is important when individual subpopulations cannot be physically isolated for functional testing, or interact with one another. As examples, different patterns of heterogeneity for general signaling markers in untreated cancer populations were found to predict differences in drug sensitivities (Singh et al., 2010), and different patterns of heterogeneous signaling responses to drug treatment were used to predict mechanism of drug action (Slack et al., 2008). Further, in tumor samples, different heterogeneous mixtures of isolated tumor infiltrating lymphocytes (TILs) were found to predict tumor reactivity or non-reactivity (Oved et al., 2009). An intriguing possibility is that heterogeneity can serve not only as a passive readout of population state, but also as a predictor of future response to perturbations, or even as a property that can be manipulated to affect a desired population output. Of course, it is not to be expected that every decomposition will contain functional information. Presumably, no significant functional differences should be observed when subpopulations are chosen randomly (i.e. ignoring all feature values), or chosen based on features that are uncorrelated with the functional readout. Taken together, the framework described above serves as starting points for rigorously exploring biological information contained in heterogeneity.
Heterogeneity has been classically observed and speculated to be a fundamental property of cellular systems (Elsasser, 1984; Rubin, 1990). While we focused our discussion on cells, heterogeneity–and the loss of information due to ensemble averages–has been studied at many scales of biology, from single molecules (English et al., 2006) to communities of whole populations (Kimura and Weiss, 1964). Here, we sidestepped difficult and fascinating problems of identifying the origins of cellular heterogeneity (Brock et al., 2009; Muotri et al., 2005; Raj and van Oudenaarden, 2008; Spencer et al., 2009), or understanding its differences between conditions that are physiological and pathophysiological, or in vivo and in vitro. Rather, we focused on the general challenge of characterizing and interpreting information contained within heterogeneity. While well-characterized biological systems may present ready choices for connecting heterogeneity and function, heterogeneity is often observed in experimental settings for which there are no immediate, a priori clues to its meaning or functional relevance. Additionally, interpreting heterogeneity is not solely an analytical challenge of fitting complex, multi-dimensional distributions, as even simple distributions (e.g. observed for the marker Sca-1) can contain subpopulations enriched for biologically-distinct functions. Finally, beyond study of single subpopulations, properties of heterogeneity (e.g. the proportions of subpopulations within an overall population) may serve as informative readouts of population physiology and predictors of responses to perturbations. The ability to transform heterogeneity into a tractable, computable, and bone fide property of cellular populations provides a rigorous starting point for determining which variation is random and which is meaningful, and for building better modeling the behaviors of individual cells.
We thank members of the Altschuler and Wu labs, Mike Brown, Nuno Lages, Tom Maniatis, David Mangelsdorf, Rama Ranganathan, and Gürol Süel for critical feedback on this manuscript. This research was supported by National Institute of Health grants R01 GM081549 (L.F.W.) and R01 GM085442 (S.J.A.), the Endowed Scholars program at UT Southwestern Medical Center (L.F.W. and S.J.A.), the Welch Foundation I-1644 (L.F.W.) and I-1619 (S.J.A.), the Rita Allen Foundation (S.J.A.) and the Mary Kay Ash Charitable Foundation (S.J.A.).
1Here, population-averages can refer to experimental measurements derived from assays that pool analytes from large numbers of cells (e.g. immunoblots or microarrays), or mathematical averages taken over distributions of single-cell measurements (e.g. flow cytometry or microscopy).