The ability to count mRNA molecules in single cells by Fluorescence In Situ Hybridization (FISH) 
allows for highly quantitative studies of cell-to-cell variation in gene expression. However, the requirement that cells be fixed before RNA FISH analysis precludes the use of RNA FISH to directly study transcriptional dynamics in single cells. Nevertheless, we have shown here how and when correlations between levels of different mRNAs can be exploited to reconstruct transcriptional dynamics, even if cells are asynchronous. All that is necessary is for FISH data to be obtained simultaneously for pairs of genes (or in some cases triplets of genes) a technique that is already well established 
. As a practical demonstration, we applied our approach to a large, pairwise FISH data set obtained from a recent study of the yeast Saccharomyces cerevisiae 
. Our results help confirm the existence of cell-autonomous metabolic cycles in unsynchronized yeast populations 
To reconstruct the dynamics of gene expression from FISH data, our approach employs Maximum Likelihood Estimation (MLE) 
to obtain the set of transcriptional parameters most likely to account for the observed data. In the regime of continuous mRNA production, apart from rescaling and inversion of time for cyclic dynamics, there is no intrinsic limit on the accuracy with which transcriptional dynamics can be reconstructed given enough data. In practice, we have shown that MLE applied to simple parameterizations for transcription (such as the leading harmonics for cyclic dynamics) allows faithful reconstruction from a moderate number of FISH observations, including noise. On the other hand, the regime where mRNA is produced in shortlived bursts 
presents additional challenges. In this bursty regime, FISH can at most report coincidences of bursts of different mRNAs, and there are consequently fundamental limits to reconstructing the underlying dynamics. For this bursty regime, successful reconstruction will generally rely on prior knowledge regarding the class of dynamics, e.g.
cycle vs. switch, and, even so, will in some cases require additional inputs, such as triplet FISH data. (In Table S1
, we explicitly quantify the amount of such additional information required for complete dynamical reconstruction.)
In applying our approach, how should one choose among models to reconstruct gene dynamics? For example, when is it better to use multiple harmonics instead of a single harmonic to model a cycle? The answer depends on the type of data. We discuss first the regime of continuous mRNA production. For this case, a standard and reliable way to choose among models when fitting data is “leave-one-out” validation, which both rewards a good fit while punishing overfitting. In the leave-one-out approach, a model is selected and its parameters are optimized on the entire data set, but with one data point left out. The resulting parameterized model is then used to fit the neglected data point. The average fitting error, taken over all possible left-out data points, is a robust measure of the quality of the model. Among competing models, the one that minimizes this error can be selected as the better choice. In the regime of continuous mRNA production, leave-one-out validation can be applied within the MLE framework by using the log(likelihood) of the left-out data point in place of the fitting error. Among competing models, the one with the largest average log(likelihood) is the best choice.
In contrast, finding the “best” model for data in the bursty mRNA regime is generally an under-constrained problem. We showed explicitly that for many cases it is impossible in principle to distinguish among different types of models, or even to find a unique best set of parameters for a given model. Intuitively, reduction of bursty FISH data to pairwise covariances means that even as the number of FISH data points approaches infinity, the number of model constraints stays finite. So, for bursty FISH data inference alone cannot guide one in choosing the model, and one must also use common sense. Clearly, prior knowledge of the system under study should be used in selecting a model. In addition, a simple rule is that one should use models that are sufficiently parsimonious in parameters not to have degenerate solutions. For example, in analyzing FISH data on metabolic cycles, we chose the one-harmonic model because there were not enough low-noise covariances to constrain a two-harmonic model. More generally, it is advisable to choose a model with significantly more well-constrained data than parameters. If the model is barely constrained, the peak of likelihood will generally be close to flat in some directions in parameter space and the reconstruction will be poor. illustrates this point:
is the minimal number of genes to avoid degeneracy, but it requires 3 times more data per gene (or twice as many total data points) to reconstruct as well as for
. In practice, one test for the quality of the reconstruction in the bursty regime is to compare the observed covariances to the reconstructed covariances, as shown in for the case of the yeast metabolic cycle genes.
Reconstruction of gene-expression dynamics from FISH data presents multiple practical challenges. One important issue is noise in the measurement of mRNA levels. For the regime of continuous mRNA production, we have shown that sufficient data can compensate for both the noise inherent in gene expression and the noise arising from uncertainty in measurement. For the regime of bursty mRNA production, “binarizing” the data into the presence or absence of a significant number of mRNA molecules substantially reduces the impact of measurement noise. A practical question here is the best threshold to use for binarizing the data. In many cases, the dynamics will be best reconstructed by setting the threshold well above 1 mRNA transcript; for example, in treating the data for metabolic cycles we chose the median expression level for each gene as its threshold. A higher threshold is less sensitive to measurement noise (fewer false positives), and to occasional transcripts produced by promoter leakage (better identification of true bursts), and a higher threshold also allows finer time resolution, as a given burst will remain above threshold for a shorter time (e.g.
preventing blurring of boundaries between switching states). However, a higher threshold reduces the number of coinciding bursts in the data, requiring more overall FISH observations. An important related issue is the possibility of correlated noise in the transcription of different genes. An example of such noise is the observed global correlation among transcription rates in yeast 
. Fortunately, global noise can be readily incorporated within the MLE framework by introducing a single additional variable in the model for gene expression, as in Eq. (10). Indeed, our treatment of global noise among genes involved in the yeast metabolic cycle yields an independent, and reasonable, estimate for this noise at 55% of mean expression. (More complex noise correlations among different genes would require case-by-case analysis.)
False-positive rates and false-negative rates are also both important considerations in analyzing FISH data. These are essentially technical issues beyond the scope of our study, but a few remarks are in order. In Ref. 
, both false positives and false negatives were reduced by the use of multiple fluorescent probes (
5) for each mRNA. Only high-contrast spots above a fluorescence threshold indicative of multiple bound probes were counted. This threshold was set empirically from the fluorescence distribution of spots outside of cell boundaries, corresponding to single probes. Nevertheless, with any such thresholding method, there will be cases where the “presence” or “absence” of an mRNA is ambiguous, and in the bursty regime such ambiguities can strongly impact the binarization of the data. Fortunately, because MLE is an intrinsically probabilistic approach, ambiguities can be dealt with by treating the two possibilities, present or absent, probabilistically. As in Ref. 
, by looking at spots outside of cell boundaries, one can obtain the distribution of intensities for spots that are actually noise (typically single probes that have not been washed away), and by looking inside cell boundaries a similar distribution can be obtained for spots that correspond to real mRNAs (multiple probes). Spots inside cells that fall into the region of overlap of these two distributions can then be assigned the corresponding probabilities of being present (real) or absent (noise). MLE can then incorporate both possible interpretations of the data, with their appropriate weights, in the data set.
A related issue, highlighted by Zenklusen et al. 
, is the existence of nascent mRNA transcripts at the locus of the gene. In the regime of continuous mRNA production, an estimate of nascent transcript number, possibly non-integer, could simply be added to mRNA counts. In the bursty regime, the existence of such transcripts might well be taken as prima facie
evidence for active transcription, and therefore treated as equivalent to the presence of an above-threshold burst.
Another practical issue in reconstructing gene-expression dynamics from FISH measurements is that data may come in mixed forms, e.g. pairwise FISH data+triplet FISH data+additional constraints or prior information. Again, MLE is naturally suited to incorporating mixed data types since all sources of information can be combined to produce the overall likelihood of the data given a set of model parameters, including prior information on the model parameters themselves (cf. Eq. (2)).
While these and other practical issues are important to consider, our successful reconstruction of yeast metabolic cycles using the FISH data of Silverman et al. 
demonstrates that our approach can provide a useful tool for analyzing gene-expression dynamics. In fact, our analysis of this data raises several new questions. First, since our reconstruction was statistically significant for the Oxidative and Reductive Building clusters but not for the Reductive Charging cluster, it is possible that cycles of the latter may be weaker in unsynchronized cultures than in synchronized chemostats. Second, our reconstruction indicates a spread among the oscillatory phases of genes within each cluster – is this spread a consequence of the limited data, or are the oscillation patterns of genes within clusters distinct? We expect that additional FISH data coupled with MLE analysis will soon provide answers to these questions.
The many advantages of FISH – absolute quantification, high time resolution, use of wild-type cells, ability to simultaneously measure multiple mRNA types, and broad application across species from bacteria 
to yeast 
to metazoans 
, suggest that FISH will find many uses in future studies of gene expression, including applications beyond those currently demonstrated. For example, FISH can be applied to cells in structured environments such as tissues or biofilms, or even cells in mixed-species consortia. In all of these cases, population level studies of gene expression cannot reveal the important cell-to-cell variations. Of course, FISH is not the only technique that yields quantitative snapshots at the single-cell level. Immunofluorescence and single-cell sequencing also meet the requirements of simultaneous measurements of two or more intracellular factors. We hope that the analysis presented here can facilitate the application of FISH and other single-cell snapshot assays to cases where both cell-to-cell variation and the dynamics of gene expression are of central interest.