We present a novel automated methodology that compensates for the effect of cell morphology on flow cytometry data, and thereby enables a quantitative analysis of high-throughput flow cytometry data. The algorithm normalizes the effect of the physical characteristics of cell size and cell granularity on the fluorescence intensity, thereby enabling the analysis of fluorescence intensities (protein abundance) in the presence of different morphological characteristics of cells in a population. In contrast to traditional gating, which discards the large majority of cells, the regression model retains all cells and thereby provides more accurate statistics, higher consistency across replicates and the ability to handle biological samples that contain far fewer cells (at least 10-fold), allowing for faster and cheaper data acquisition. This is relevant when one is looking for rare cells (e.g., stem cells), or when performing high-throughput screens where only a few hundred cells per experimental condition are being assayed.
The fact that the regression model uses a much larger fraction of cells in a biological sample points to an important feature of the method, namely that it provides fluorescence information across the complete population of cells in the biological sample. A traditional gating approach, on the other hand, reports the behavior of the cells with the specific physical properties (cell size and granularity) that were used to define the gate. In particular, when biological function is correlated with morphological characteristics, for example, cell-cycle-dependent genes (Supplementary Figures S11 and S12
), the choice of the gate has a profound influence on the observed fluorescence, potentially (and inadvertently) leading to subjective and biased data analysis. However, if biological function is correlated with morphological characteristics, the regression model would remove this biological effect on fluorescence. Batenchuk et al (2011)
present a methodology to reduce extrinsic transcriptional noise using a large gate followed by a cell morphology binning approach, which might be promising in such a scenario. A detailed discussion of this topic is found in Supplementary Information
. Related to this point is the fact that flow cytometry experiments are in general difficult to reproduce, since there is no easy and formal way to supply a description of the gate, which is often manually drawn using a flow cytometry software package. The regression model, by avoiding the gate altogether, affords a much greater degree of reproducibility. However, it should again be pointed out that in many applications, the goal of gating is not only to remove morphology-associated variation in fluorescence (which the regression model accomplishes), but also to delineate or characterize subpopulations (e.g., removing dead cells), especially for complex mixtures containing different cell types. In such cases, an initial gating procedure of some sort is still necessary with the regression model becoming a powerful complement. Especially when subpopulations do not behave uniformly in terms of the relationship between fluorescence and morphological characteristics (as assumed by the regression model), initial delineation of subpopulations is essential.
Flow cytometry experiments are rapidly growing in size using high-throughput technologies. Researchers often desire to follow the protein expression behavior across different conditions and time points for large collections of cell types, strains or perturbed cells. These different experimental and genetic conditions can lead to samples with widely differing morphological characteristics, making it more difficult or even impossible to choose a proper gate. The regression model overcomes this by enabling the direct comparison of samples even if their cells do not share similar characteristics in terms of cell size and granularity. As we have shown, the regression model ensures that different biological samples are directly comparable to one another, even in the case where there is no overlap whatsoever between the cells in the FSC/SSC two-dimensional space in two or more biological samples. This is accomplished by extrapolation of the average fluorescence intensity across cell sizes and granularities that were not present in the actual sample. Monotonicity constraints were introduced into the regression model to guarantee stable behavior of these extrapolated values. Although in theory, this cannot guarantee the validity of the obtained extrapolated intensities, in practice, this approach worked exceptionally well in all experiments and (large) data sets examined. This makes the regression model a suitable systems biology tool to analyze large (high-throughput) flow cytometry data sets containing hundreds or thousands of biological samples in a highly automated manner.
The algorithm also proved valuable for dissecting bar-coded flow cytometry data from a human cell line, demonstrating its widespread utility. Indeed, the flexibility of the regression model, due to the inclusion of non-linear terms, is apparent from the fact that two different types of staining are successfully modeled: cytoplasmic staining (GFP), where the correlation between fluorescence and forward scatter depends on cell volume, and surface staining (bar coding), where this correlation depends on surface area. Further, the universality of the method was established by applying it to data sets from different organisms and laboratories.
Finally, we have used this methodology to analyze the effect of chromatin remodeling proteins on POT1
expression and variability during a carbon shift from glucose to oleate and back to glucose. Yeast cells change in size and morphology during this carbon shift, making the analysis impossible by traditional gating. Pot1p–GFP shows a clear bimodal pattern; that is, during the carbon shift there is a bifurcation event, after which two different subpopulations can be recognized; one with a higher expression, the other with low expression. This indicates that only a fraction of the cells in the population can achieve activation under oleate induction. This behavior has been previously ascribed to transcriptional network architecture (Ramsey et al, 2006
), but the results presented here together with previous observations (Ratushny et al, 2008
) indicate that in the case of POT1
, the effect is also mediated by chromatin modifiers. In particular, Htz1 appears to play an important role in controlling this bimodal behavior. Specifically, deletions of HTZ1
and some of its major effectors (in either its nuclear transport or its chromatin-binding functions) showed either no bimodality or a delayed bimodality with only a low percentage of high expressers.
Several nucleoporins were included in the miniarray because an increasing amount of data suggest that gene activity is linked to physical position within the nucleus and the NPC may provide a means for genes to be recruited to the periphery and either promote gene activation or repression. The precise role(s) of the NPC in these dynamic, complex (and potentially physically distinct) activities operating at the nuclear periphery remain to be elucidated. POT1
is localized to the nuclear periphery coincident with activation (Supplementary Figure S16
). This behavior is shared with other highly expressed genes and, in particular genes within subtelomeric regions (POT1
is located near a subtelomeric region (~40 kb from left telomere of Chr IX)) (Casolari et al, 2004
; Cabal et al, 2006
; Brickner et al, 2007
). This movement is important for robust transcriptional activation and has been termed reverse recruitment. This activity requires the seven-member Nup84 complex. Only three members of this complex were included in the miniarray, yet all three (Nup84, Nup120 and Nup133) were found in the non-bimodal cluster 1 leading to the hypothesis that subtle differences in POT1
promoter localization can also be the cause of the divergent fate of cells exposed to oleate. In addition, Nup2 has been associated with peripheral localization of genes during their activation (Brickner et al, 2007
). It is also suggested that this recruitment promotes Htz1-mediated epigenetic memory. Independent studies have found that Nup2 and Htz1 are functionally linked in chromatin function through boundary activity (separating active from repressed chromatin) (Ishii et al, 2002
; Dilworth et al, 2005
). Remarkably, Nup2 was found in the same nine-member cluster (cluster 4) as Htz1. Taken together, these results suggest that the analysis performed here is sufficiently precise to reveal functional relationships among proteins of different families.
We hypothesize that the transitions between chromatin activity states mediated by Nup2, Htz1 and/or Nup84 complex may have been evolutionarily selected to create a bimodal profile to facilitate adaptation to non-predictable environments, where a commitment to oleate metabolism implies a high risk if the change in carbon source is transient. Biological advantages derived from this phenotype are related to the high investment that cells need to make in order to adapt to oleate as a carbon source. This not only requires expressing new genes, but also commits cells to major structural changes, such as creating new peroxisomes. Maintaining a heterogeneous population, especially in cases of highly committed responses, can be advantageous as it leaves a fraction of the population able to respond quickly upon a switch back to the original conditions (Acar et al, 2008
). This study provides the first thorough analysis of this phenomenon in the oleate response in yeast leading to the identification of several chromatin modifiers, and NPC components required to maintain population variability during the transcriptional response.