Large variations in cell size and shape can undermine traditional gating methods for analyzing flow cytometry data. Correcting for these effects enables analysis of high-throughput data sets, including >5000 yeast samples with diverse cell morphologies.
The regression model approach corrects for the effects of cell morphology on fluorescence, as well as an extremely small and restrictive gate, but without removing any of the cells.In contrast to traditional gating, this approach enables the quantitative analysis of high-throughput flow cytometry experiments, since the regression model can compare between biological samples that show no or little overlap in terms of the morphology of the cells.The analysis of a high-throughput yeast flow cytometry data set consisting of >5000 biological samples identified key proteins that affect the time and intensity of the bifurcation event that happens after the carbon source transition from glucose to fatty acids. Here, some yeast cells undergo major structural changes, while others do not.
Flow cytometry is a widely used technique that enables the measurement of different optical properties of individual cells within large populations of cells in a fast and automated manner. For example, by targeting cell-specific markers with fluorescent probes, flow cytometry is used to identify (and isolate) cell types within complex mixtures of cells. In addition, fluorescence reporters can be used in conjunction with flow cytometry to measure protein, RNA or DNA concentration within single cells of a population.
One of the biggest advantages of this technique is that it provides information of how each cell behaves instead of just measuring the population average. This can be essential when analyzing complex samples that consist of diverse cell types or when measuring cellular responses to stimuli. For example, there is an important difference between a 50% expression increase of all cells in a population after stimulation and a 100% increase in only half of the cells, while the other half remains unresponsive. Another important advantage of flow cytometry is automation, which enables high-throughput studies with thousands of samples and conditions. However, current methods are confounded by populations of cells that are non-uniform in terms of size and granularity. Such variability affects the emitted fluorescence of the cell and adds undesired variability when estimating population fluorescence. This effect also frustrates a sensible comparison between conditions, where not only fluorescence but also cell size and granularity may be affected.
Traditionally, this problem has been addressed by using ‘gates' that restrict the analysis to cells with similar morphological properties (i.e. cell size and cell granularity). Because cells inside the gate are morphologically similar to one another, they will show a smaller variability in their response within the population. Moreover, applying the same gate in all samples assures that observed differences between these samples are not due to differential cell morphologies.
Gating, however, comes with costs. First, since only a subgroup of cells is selected, the final number of cells analyzed can be significantly reduced. This means that in order to have sufficient statistical power, more cells have to be acquired, which, if even possible in the first place, increases the time and cost of the experiment. Second, finding a good gate for all samples and conditions can be challenging if not impossible, especially in cases where cellular morphology changes dramatically between conditions. Finally, gating is a very user-dependent process, where both the size and shape of the gate are determined by the researcher and will affect the outcome, introducing subjectivity in the analysis that complicates reproducibility.
In this paper, we present an alternative method to gating that addresses the issues stated above. The method is based on a regression model containing linear and non-linear terms that estimates and corrects for the effect of cell size and granularity on the observed fluorescence of each cell in a sample. The corrected fluorescence thus becomes ‘free' of the morphological effects.
Because the model uses all cells in the sample, it assures that the corrected fluorescence is an accurate representation of the sample. In addition, the regression model can predict the expected fluorescence of a sample in areas where there are no cells. This makes it possible to compare between samples that have little overlap with good confidence. Furthermore, because the regression model is automated, it is fully reproducible between labs and conditions. Finally, it allows for a rapid analysis of big data sets containing thousands of samples.
To probe the validity of the model, we performed several experiments. We show how the regression model is able to remove the morphological-associated variability as well as an extremely small and restrictive gate, but without the caveat of removing cells. We test the method in different organisms (yeast and human) and applications (protein level detection, separation of mixed subpopulations). We then apply this method to unveil new biological insights in the mechanistic processes involved in transcriptional noise.
Gene transcription is a process subjected to the randomness intrinsic to any molecular event. Although such randomness may seem to be undesirable for the cell, since it prevents consistent behavior, there are situations where some degree of randomness is beneficial (e.g. bet hedging). For this reason, each gene is tuned to exhibit different levels of randomness or noise depending on its functions. For core and essential genes, the cell has developed mechanisms to lower the level of noise, while for genes involved in the response to stress, the variability is greater.
This gene transcription tuning can be determined at many levels, from the architecture of the transcriptional network, to epigenetic regulation. In our study, we analyze the latter using the response of yeast to the presence of fatty acid in the environment. Fatty acid can be used as energy by yeast, but it requires major structural changes and commitments. We have observed that at the population level, there is a bifurcation event whereby some cells undergo these changes and others do not. We have analyzed this bifurcation event in mutants for all the non-essential epigenetic regulators in yeast and identified key proteins that affect the time and intensity of this bifurcation. Even though fatty acid triggers major morphological changes in the cell, the regression model still makes it possible to analyze the over 5000 flow cytometry samples in this data set in an automated manner, whereas a traditional gating approach would be impossible.
Cells exposed to stimuli exhibit a wide range of responses ensuring phenotypic variability across the population. Such single cell behavior is often examined by flow cytometry; however, gating procedures typically employed to select a small subpopulation of cells with similar morphological characteristics make it difficult, even impossible, to quantitatively compare cells across a large variety of experimental conditions because these conditions can lead to profound morphological variations. To overcome these limitations, we developed a regression approach to correct for variability in fluorescence intensity due to differences in cell size and granularity without discarding any of the cells, which gating ipso facto does. This approach enables quantitative studies of cellular heterogeneity and transcriptional noise in high-throughput experiments involving thousands of samples. We used this approach to analyze a library of yeast knockout strains and reveal genes required for the population to establish a bimodal response to oleic acid induction. We identify a group of epigenetic regulators and nucleoporins that, by maintaining an ‘unresponsive population,' may provide the population with the advantage of diversified bet hedging.