To briefly go over the relationship between lensfree fluorescent imaging on a chip and compressive sampling theory, let us denote the fluorescent particle/cell distribution within the sample volume with

where N denotes the number of voxels. To better relate our model to a real imaging experiment, let us also assume that the physical grid size in

is
d. For visualization purposes, one can think of a simple microfluidic channel such that

would represent the points on the active surface of the channel, where the captured cells reside within an imaging area of N x
d
2. For multi-layered micro-channels, however,

would represent a 3D discrete volume.
For the applications that are of interest to this work, such as wide-field fluorescent cytometry, rare cell analysis and high-throughput micro-array imaging, one can in general assume that

is
sparse to start with, such that only S coefficients of

are non-zero, where S<<N. This assumption is further justified with our unit magnification lensless geometry since most cells of interest would not be over-sampled due to limited spatial resolution, restricting the value of S for a practical

. Therefore, the sparsity of

is the first connection to compressive sampling, as it is an important requirement of its underlying theory [
1–
3].
In a lensfree fluorescent imaging platform as shown in ,

uniquely determines the intensity distribution that is impinging on the detector-array. For each non-zero element of

, a wave is transmitted, and after passing through different layers on the chip it incoherently adds up with the waves created by the other fluorescent points within the sample volume. Therefore, one can write the intensity distribution right above the detector plane (
before being measured/sampled) as:
where

represents the 2D wave intensity right before the detector plane that originated from the physical location of

. The analytical form of

can be derived for any particular lensfree geometry such as the one presented in . However, from a practical point of view, it can easily be measured for each object plane by using e.g., small fluorescent particles, which is the approach taken in this work.
Without the use of a faceplate in , it is straightforward to see that the functional form of

for a given object plane is space invariant. This is equivalent to say that

, where

is the incoherent point-spread function (psf) of the system for a given object layer, and

denotes the physical location of

. Note that in this definition,

has no relationship to the pixel size at the detector since
Eq. (1) describes the intensity right before the sampling plane. The same space invariance property also holds with a
dense fiber-optic faceplate as shown in since there is a significant gap between the sample and faceplate planes, and a similar gap between the bottom surface of the faceplate and the detector plane. Therefore for our lensfree fluorescent imaging geometry of , with or without the faceplate operation, one can in general write:
For multiple layers of fluorescent objects, a similar equation could also be written where the incoherent point-spread function of different layers are also included in the summation.
Equation (2) relates the “already” sparse fluorescent object distribution (

) to an optical intensity distribution that is yet to be sampled by the detector array. The representation basis provided by

is surely not an orthogonal one since it is based on lensfree diffraction. This is not limiting the applicability of compressive decoding to our work since

is assumed to be already sparse, independent of the representation basis. On the other hand, the fact that

does not form an orthogonal basis limits the spatial resolution that can be compressively decoded, since for closely spaced

values, the corresponding

would be quite similar to each other for a given detection signal to noise ratio (SNR). This is related to the restricted isometry property [
1,
2] of our system as will be discussed later on; however its physical implication is nothing new since it is already known that we trade off spatial resolution to achieve wide-field lensfree fluorescent imaging with unit magnification.
Next, sampling of

at the detector-array can be formulated as:
where

represents the sampling/measurement basis;
m=1:M denotes the m
th pixel of the detector-array with center coordinates of (

,

); and

represents the pixel function, which can be approximated to be a detection constant,
K, for |
x|,|
y|≤W/2 (assuming a square pixel size of W) and 0 elsewhere, |
x|,|
y|>W/2. In this notation, the fill-factor of the detector-array together with the quantum efficiency etc are all lumped into
K. Note that in this work, we have used W=9 µm and W=18 µm (through pixel binning).
With these definitions, the lensfree fluorescent imaging problem of this manuscript can be summarized as such: based on M independent measurements of

, we would like to estimate the sparse fluorescent source distribution,

, at the sample.
To give more insight,
Eq. (3) models a hypothetical near-field sampling experiment, where each pixel of the CCD measures part of

. For an
arbitrary intensity distribution

impinging on the detector array, a few pixel values (

) can surely
not represent the entire function. However, if the sampled intensity profile at the detector plane is created by a
sparse distribution of incoherent point sources located in the far-field, then much fewer pixels can
potentially be used to recover the source distribution based on compressive decoding. For this decoding to work efficiently, each pixel should ideally detect “some” contribution from all the

values, which implies the need for a relatively wide point spread function. However since spreading of the fluorescence also decreases the signal strength at the detector plane, the optimum extent of the point spread function is practically determined by the detection SNR. On one extreme, if the same
sparse source distribution (

) was hypothetically placed in direct contact with the CCD pixels, this would
not permit any compressive decoding since each incoherent point source can now only contribute to a
single pixel value. For instance two sub-pixel point sources that are located on the same pixel would only contribute to that particular pixel, which would make their separation physically
impossible regardless of the measurement SNR. However, the same two sub-pixel point sources could be separated from each other through compressive decoding if they were placed some distance above the detector plane, such that more pixels could detect weighted contributions of their emission.
Since we are considering non-adaptive imaging here (i.e., no
a priori information about the possible x-y locations of the fluorescent particles/cells), we have not used a sub-set of the pixel values (
Im) to reconstruct

. Therefore, for a single layer of object, using a unit magnification as in , we have N x
d
2 = M x W
2. In this work, to claim a spatial resolution of ~10µm at the object plane, we used
d = 2-3 µm, which implies N ≥ 9M for W=9 µm. For some experiments, we have also used a pixel size of W=18µm with
d=2µm, implying N=81M. Furthermore, for multi-layer experiments (to be reported in the next section) where 3 different fluorescent channels were vertically stacked and simultaneously imaged in a single snap-shot, we had N=27M, which all indicate
compressive imaging since the number of measurements (M) are significantly smaller than the number of reconstructed points (N).
As already known in compressive sampling literature, the effectiveness of the decoding process to estimate

in our technique should also depend on the maximum spatial correlation between

and

for all possible
m=1:M and
i=1:N pairs. Accordingly, this maximum spatial correlation coefficient defines the measure of incoherence between sampling and representation bases, which can then be related to the probability of accurately reconstructing

from M measurements [
1–
3]. For a given object plane, because of the shift invariant nature of both

and

, this coherence calculation is equivalent to calculation of the correlation between the pixel function

and the incoherent point-spread function
p(
x,
y). The smaller the correlation between these two spatial functions is, the more accurate and efficient the compressive decoding process gets. Based on this, a smaller pixel size would further help in our lensfree on-chip scheme by reducing this maximum correlation coefficient, i.e., increasing incoherence between

and

.
Thus, we can conclude that the primary function of compressive sampling in this work is to
digitally undo the effect of diffraction induced spreading formulated in
Eqs. (1)–
(2) through decoding of lensfree image pixels indicated in
Eq. (3). Such a decoding process, however, can also be done
physically rather than digitally, through the use of a lens (as in conventional fluorescent microscopy at the cost of reduced FOV) or through the use of a faceplate as we demonstrate in this work. The use of the faceplate in
partially decodes the diffraction induced spreading, which also relatively increases the correlation between

and
p(
x,
y), since
p(
x,
y) gets narrower and stronger with the faceplate. Despite this relatively increased coherence between the sampling and representation bases, the improvement in the detection SNR with the faceplate enables better measurement of
p(
x,
y) as well as

values, which then improves the accuracy of the compressive decoding process in terms of achievable spatial resolution. This will be further quantified in experimental results presented in the next section.
Finally, we would like mention that the above analysis could be also done using a different set of measurement and representation bases without changing the end conclusions. In the above analysis, we did
not include the diffraction process as part of the measurement, and therefore the measurement basis only involved the pixel sampling at the detector-array. As an alternative notation, we could have also used

for the representation basis, which implies that
Ψ is an identity matrix. This is not a surprising choice since the object,

is already sparse and therefore the sparsifying matrix can be seen as an identity matrix. Based on this definition of the representation basis, the measurement basis

will now need to include both the diffraction and the pixel sampling processes. Following a similar derivation as in
Eq. (3), the measurement basis now becomes:

. As expected, the correlation behavior between

and

for all possible
m and
i pairs remains the same as before, yielding the same set of conclusions that we arrived using the previously discussed choice of bases.
While it is just a matter of notation, with this new pair of bases, it is also easier to qualitatively relate the spatial resolution to restricted isometry property (RIP) of the system. RIP is a measure of the robustness of sparse signal reconstruction for N>M and S<<N [
1–
3]. For this new choice of bases, RIP holds if all the possible subsets of S
columns taken from

are nearly orthogonal to each other. Assuming that the pixel size is much narrower than the incoherent psf of the object layer of interest, we can then approximate:
Therefore for RIP to hold in this lensfree system, for any arbitrary S choices of
i = 1:N, the sub-set of functions

should be nearly orthogonal in

. If one purely relies on diffraction, this condition can be harder to satisfy for densely spaced

which practically limits the achievable spatial resolution for a given detection SNR. Once again, physically this is not surprising since it is already known that we trade off resolution to achieve wide-field lensfree fluorescent imaging on a chip. Structured surfaces could potentially help achieving a better resolution by randomly breaking the space invariance of the incoherent psf, which is not going to be covered within the context of this manuscript.