Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2862147

Formats

Article sections

Authors

Related links

Nat Methods. Author manuscript; available in PMC 2010 November 1.

Published in final edited form as:

PMCID: PMC2862147

NIHMSID: NIHMS187624

AUTHOR CONTRIBUTIONS

C.S.S. and K.A.L. worked out the algorithm and implementation, N.J. and K.A.L. performed the experiments, B.R. and K.A.L. designed the research and wrote the paper

Correspondence should be addressed to K.A.L. (Email: klidke/at/unm.edu)

Users may view, print, copy, download and text and data- mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: http://www.nature.com/authors/editorial_policies/license.html#terms

The publisher's final edited version of this article is available at Nat Methods

See other articles in PMC that cite the published article.

We describe an iterative algorithm that converges to the maximum likelihood estimate of the position and intensity of a single fluorophore. Our technique efficiently computes and achieves the Cramér-Rao Lower Bound, an essential tool for parameter estimation. An implementation of the algorithm on graphics processing unit hardware achieves more than 10^{5} combined fits and Cramér-Rao Lower Bound calculations per second, enabling real-time data analysis for super-resolution imaging and other applications.

In many single molecule fluorescence applications, it is often desired to find the position and intensity of a single fluorophore as well as to estimate the accuracy and precision of these parameters. Where accuracy is a measure of the systematic error or bias and precision is a measure of the statistical error of an estimator^{1}. In recent work that uses single-molecule localization to generate super-resolution images^{2}^{-}^{6}, single emitters are located and on the mosaic of their found positions a two-dimensional Gaussian profile is placed to generate the final super-resolution images. The width of the placed Gaussian blob, *σ*, is given by the precision of the fluorophore position localization *σ* = *(σ ^{2}_{x}* +

Since the speed and precision of single particle localization has long played an important role in single particle tracking as well as in other single molecule biophysical techniques that rely on fluorescent reporters, others have also considered these issues. Several algorithms from the literature for finding particle positions are compared in^{7}, but, without the context of a statistical framework. The theoretically best-possible estimation precision of a fluorophore position has been derived in^{8} by using the well established statistical method of finding the CRLB in an unbiased parameter estimation problem. They considered many of the effects in a real system including background fluorescence, finite camera pixel size, and camera readout noise and they recently made a software tool available for estimation (www4.utsouthwestern.edu/wardlab). Non-linear least-squares (NLLS) and MLE position estimates were compared to the CRLB in^{9}, and it was found that, in general, MLE gives better precision than NLLS. The better precision of MLE is in concurrence with our results, however in^{9} they were investigated with assumed known emission and background rates. Here we demonstrate a robust, iterative routine that finds the particle position, the intensity and the background count rate. We do not consider camera readout noise since for electron multiplying (EM)CCD cameras, which are generally used for the fast frame rates desired in super-resolution imaging, the readout noise is much less than 1 rms e^{-} when using large EM gain. As described further in the Supplementary Note, Supplementary Data and Supplementary Figs. 1-4, the method presented is not restricted to 2D imaging with a symmetric point spread function (PSF), but can be extended to handle super-resolution techniques that encompass astigmatic imaging for *z* resolution as in^{10}. In this case, the *z* position is also calculated directly (not from intermediate *σ _{x}, σ_{y}* fits) and returned with CRLB based uncertainties. The results of the iterative algorithm compared to CRLB-based theoretical values are shown for a range of background rates and total collected photon counts of the PSF (Fig. 1). We show results for

Performance comparison on simulated data. a) The localization precision of our iterative method is compared to that given by the CRLB. Also shown are the mean uncertainties reported from CRLB calculations for every image using the found intensity and **...**

Our iterative update scheme is similar to that described in^{13}. We show, however, that a Gaussian approximation for the two-dimensional fluorophore PSF ^{11} and following localization leads to a compact analytical expression that allows for computationally fast localization without compromising on the localization precision. Our approach achieves the CRLB after a few (~ 10) iterations. It should be noted that the CRLB predicts the correct precision only when the model function is correct, and the isotropic Gaussian model may not be appropriate when imaging fluorophores with a fixed dipole orientation^{14}, leading to anisotropic emission. As described in the Supplementary Data, a ‘rule-of-thumb’ fitting region size of 2 × 3*σ*_{PSF} + 1 is used. For *z* resolution imaging the Gaussian PSF model is less reliable due to optical aberrations and the simplified model itself^{11}.

GPU-based computation has the potential to increase floating point calculation speed by a factor of 10-100 as compared to a modern CPU if the problem is amenable to a parallel processing approach (www.nvidia.com). A generic C-like language interface is available for simplified GPU programming (Nvidia CUDA), and a MATLAB (The Mathworks, USA) interface has been developed. Operating with a fixed number of iterations complements the GPU's single instruction multiple data strategy (SIMD). A GPU implementation of our iterative method can perform 10^{5} combined MLE and CRLB calculations per second of the four parameter model needed to describe the emittance of a fluorophore (Fig. 2).

Basic Concept of Single Molecule Localization via GPU Implementation. The input consists of multiple (up to millions) pre-selected ROIs arranged in a 3D data set. Smaller data sets are arranged and processed in chunks that fill the multi-processor shared **...**

The CPU and GPU performance, characterized by the number of combined position fits and CRLB calculations performed per second, are shown in Table 1. The slowest GPU tested outperforms the CPU by more than one order of magnitude, with the fastest achieving 2.6·10^{5} fits per second on a 7×7 fitting box size. We attribute this level of performance gain over the CPU to the fact that this estimation problem is almost ideal for the GPU SIMD architecture. Many iterations are performed on the same data, which are stored in local shared memory, and each thread is independent, eliminating synchronization delays. We also note that all CPU computation only ran on a single thread. In the right column of Table 1 the performance of a non-linear, least-squares fit (in C-code) on a CPU is shown for reference. It is twice as slow as our iterative algorithm on a CPU. Commonly used MATLAB LM optimization only computes about 5 fits/second. Given the readout rate of current high-end (EM)CCD cameras (~ 10 Mhz) our GPU implementation performs combined fits and uncertainty estimates at a speed sufficient for real-time analysis (see **Methods**).

Computational performance. CPU and GPU implementations of the iterative MLE and a LM non-linear least-squares fit.

To summarize our findings, we have derived an iterative approach for making a maximum likelihood estimate of the position and intensity of a single fluorophore as well as the background count rate using a two-dimensional Gaussian PSF model and a Poisson noise model. The iterative method achieves the minimal possible estimation uncertainty, as given by the Cramér-Rao Lower Bound, over a wide range of emission and background rates that could be found in single molecule experiments. Implementation of the iterative method on a modern graphics processing unit yields more than 10^{5} combined fits and CRLB calculations per second, greatly facilitating the analysis of large data sets found in single molecule based super-resolution techniques and is suitable for real-time data analysis.

Methods and any associated references are available in the online version of the paper at http://www.nature.com/naturemethods/.

The iterative method to solve the MLE problem described above is implemented on a GPU using NVIDIA's compute unified device architecture (Nvidia CUDA), a C based programming language that makes it possible to readily program parallelized algorithms that are executed on a GPU. High-end gaming and computing GPU's have a (card dependent) large region of global device memory, usually several hundred MBytes. Execution is performed on a number (card dependent) of multi-processors. Multi-processors contain eight sub-processors and have 16 KBytes local memory that is shared between sub-processors. Access to local shared memory is fast, whereas access to the device global memory incurs a large performance penalty, and should be avoided when possible. The programming model follows the GPU architecture in that parallelized execution is performed by breaking down computations into ‘blocks’ and ‘threads’. Each block executes a set of threads using one multi-processor. Typically, performance is optimal when multiples of 32 threads (called a ‘warp’) are scheduled and executed using the 8 sub-processors. However, we found for all fit box sizes, the maximum fits/second occurred when the maximum fits/block were used, which is limited by the available 16KB shared memory per block. This is likely due to compiler optimization and the fact that each fit is independent of all others (no thread synchronization is required).

We map our iterative algorithm on this programming model in the following way. A data stack consisting of identically sized sub regions, which are centered single emitter images, are input to the function. This data is copied from host (CPU) memory to device (GPU) main memory. This data set is divided into blocks, which consist of the largest number of images that can fit into shared memory. The execution of a block begins by copying the data sub-set into local shared memory. Each thread then calculates a complete fit and CRLB calculation for one image. The Fisher information matrix is calculated using Supplementary Note Eqs. 9 and 11 and the CRLB is calculated using the analytical expression for the inverse of a 4 × 4 matrix. The CPU performance was measured by replacing the block/thread architecture with nested loops which call the same sub-function, and was compiled using Microsoft Visual Studio Express 2008. Images were loaded in MATLAB (The Mathworks, USA) as arrays and the C-Code and the CUDA GPU code were called via mex-files. The CPU was a AMD Phenom II X3 720 @2.8 GHz and only one core was used for the C-code.

The CUDA routine described above operates on a data set consisting of identically sized images that contain images of (potential) single molecules. For the analysis of synthetic data, a stack is generated using the same finite pixel approximation, including background as described in the Supplementary Note. The center coordinate of each simulated emitter is randomly shifted within the central pixel to prevent a biased result. After the generation of the data stack, the images are corrupted with Poisson noise. This data stack is analyzed by the routine, which returns the estimated position, intensity, and background and the CRLB calculation for each 2D image in the stack. No camera read noise is added. Camera read noise for electron multiplying (EM) CCD cameras, which are generally used for the fast frame rates desired in super-resolution imaging, is much less than 1 rms e^{-} with large EM gain.

The standard way of fitting a Gaussian to data is via least-squares fitting. We used 1) an existing implementation of MATLAB via the optimization toolbox (lsqcurefit) where we enabled the Levenberg-Marquardt option as minimization scheme and 2) an implementation from Numerical Recipes in C^{15}. We ran the test on fits with computed Jacobian only and on fits where we supplied analytic first derivatives. We used the following limits in the stopping criterion: tolerance on the parameters 10^{−4}, tolerance on the function 10^{−15} and maximal 10^{5} function evaluations. The stopping criterion was in all cases determined by the accuracy put on the parameters. The MATLAB routine is a lot slower (about two orders of magnitude) than the C implementation although it makes automatic use of all available cores on the CPU if multi-threading is enabled.

Single molecule imaging experiments were performed in an epi-fluorescence microscope setup consisting of an inverted microscope (IX71, Olympus America Inc.), 1.45 NA objective (U-APO 150x NA 1.45, Olympus America Inc.), 635 nm diode laser (Radius 635, Coherent Inc.), and an electron multiplying CCD camera (Luca DL6581-TIL, Andor Technologies PLC.). The pixel size is 10 μm. The epi-fluorescence filter setup consisted of a dichroic mirror (650 nm, Semrock) and an emission filter (692/40, Semrock). Individual Cy5 molecules were immobilized on an amino-silane ((3-Aminopropyl)triethoxysilane, Sigma-Aldrich) treated 8-well chambered cover slips (Lab-Tek II, Nunc) via an NHS-ester linkage attached to the Cy5 (Cy5 Mono-reactive dye pack, GE Healthcare). An oxygen scavenging system^{16} was used to extend fluorophore lifetimes and quench fluorophore triplet states. This was necessary to perform repeated measurements of the same single emitter for several frames while acquiring sufficient photons in order to address localization accuracy. This is not necessary in a dedicated experiment.

Data was recorded by the CCD camera at either 10 or 20 frames per second. All data was post-processed by 1) subtracting a pixel-dependent camera-offset, which was created by averaging 300 dark frames, and 2) multiplying the resulting image by a gain factor to restore correct Poisson statistics, as done in^{17}. Single molecules candidates were identified in each time frame as regions where the 2-D Gaussian filtered image (*σ* = *σ*_{PSF}) was greater than one standard deviation of this image. Note that due to the speed of the GPU implementation, a simple but fast method for identifying candidates is preferred as well as one that errs on the side of including regions that do not contain single molecules. Square regions of a specified number of pixels that included all identified regions in the time series were collected into one stack and input to the GPU routine. The resulting found coordinates were used in building trajectories only if they passed the following criterion: 1) Reported localization accuracy was less than one fifth *σ*_{PSF} in each dimension, and 2) 1/*N* Σ* _{k}*ln(

The width of the PSF used in the fitting routine was found by minimizing the mean square error between the finite pixel model and the summed projection over a 100 frame time series. The *σ*_{PSF} used in further analysis is the average found from analyzing the summed projection of 5 different single emitters.

The 3D astigmatic imaging was calibrated by imaging 100 nm red ( 690 nm emission) beads (FluoSphere, Invitrogen) bound to a the bottom of an 8 well cover slip chamber (Lab-Tek II, Nunc). The filter setup used was the same as that used for single molecule Cy5 imaging. We imaged using a 60x 1.2 NA water objective. A 500 mm focal length cylindrical lens was inserted in the emission beam path just after the first lens of a two color beam splitter (OptoSplit II, Cairn Research, UK). A piezoelectric z-stage (Nano-LPS, Mad City Labs) translated the focal plane in steps of 50 nm from −0.5 μm to 0.5 μm. At each focal plane, 20 images of a bead were captured. The fit box size used is again calculated by 2 × 3 × *σ*_{PSF} +1, but here *σ*_{PSF} is taken as the maximum value of either σ_{PSFx} or *σ*_{PSFy}; in this case giving a fit box size of 13 × 13 pixels.

After gain and background correction, the sum of all images from each focal plane were used to find *σ _{x}*(

Initially, the bottleneck for the build-up of a super resolution image was the switching cycle speed for activating only a very small number of particles per image, which resulted in imaging times of many hours^{19}. Subsequent speed-ups were achieved by optimizing the fluorophores for activation based super resolution, protocol improvements or reduction in the number of time frames^{20}^{-}^{22}.

The fundamental relationships between error and acquisition rate (number of activation cycles) were investigated in a theoretical study^{24}. The findings are relevant to specific chosen or given activation probability. However, to assess the required speed for real-time data analysis, we address the problem somewhat differently. We use the field-of-view *V*, the size of the footprint of the PSF *P*, the frame rate *F* and the fill factor *f* of the single emitters distribution on the field-of-view. The required fits/second for real-time data analysis are then

We consider two common cases of emCCD cameras for the maximal fill factor of *f*=1 and a PSF of *P*=7 × 7 pixels: i) *V*=128 × 128 pixels, *F*=500 frames/second and ii) *V*=512 × 512 pixels, *F*=30 frames/second. For the first case ~1.7 × 10^{5} fits/second are required, and for the second ~1.6 × 10^{5} fits/second respectively; these values are about equal as the total readout rate (~10 Mhz) is the limiting factor and is about equal. The PSF footprint can vary for different physical CCD camera pixel sizes and magnifications. In any case, the fastest GPU (2.6 × 10^{5} fits/second) tested already full fills this requirement for current fluorophores and CCD cameras! Of course, a fill factor of *f*=1 is optimistic; more realistic values are 0.1-0.5, but dependent on the experimental conditions and can be chosen according to^{23}. This means that also the slower (and cheaper) cards already are sufficient in current practice for real-time fitting of positions. The significance of the GPU fitting in the context of the entire process of segmentation (identifying regions of interest for single molecule fits), organizing ROIs, single molecule fitting, and reconstruction, is shown in Supplementary Table 1. Segmentation and reconstruction are performed as described in^{25} with the segmentation performed on the GPU. The results show that with even with 10^{6} total fits, corresponding to 100 fits per frame, the overall processing could exceed the maximum possible frame rate of 500 Hz of available EMCCDs.

Click here to view.^{(214K, pdf)}

The following two sections from the Supplementary Discussion should appear under Supplementary Data in the SI pdf.

Performance on Synthetic Data Sets

Performance on Experimental Single Molecule Data

AOP

An iterative algorithm implemented on a graphics processing unit determines maximum likelihood estimates of the positions of isolated fluorophores at a rate of 10^{5} localizations per second and allows real-time generation of super-resolution images with high precision.

ISSUE

An iterative algorithm implemented on a graphics processing unit determines maximum likelihood estimates of the positions of isolated fluorophores at a rate of 10^{5} localizations per second and allows real-time generation of super-resolution images with high precision.

Click here to view.^{(115K, pdf)}

This work was supported by American Cancer Society grant #IRG-92-024 and National Institutes of Health grant #1P50GM085273-01A1. We thank M. Malik and I.T. Young for reading the manuscript. The authors declare that they have no competing financial interests.

1. van den Bos A. Parameter Estimation for Scientists and Engineers. Wiley & Sons; New Jersey: 2007.

2. Betzig E, et al. Science. 2006;313:1642–1645. [PubMed]

3. Bates M, Huang B, Dempsey GT, Zhuang X. Science. 2007;317:1749–53. [PMC free article] [PubMed]

4. Egner A, et al. Biophysical Journal. 2007;93:3285–3290. [PMC free article] [PubMed]

5. Hess ST, Girirajan TPK, Mason MD. Biophysical Journal. 2006;91:4258–4272. [PubMed]

6. Lidke KA, Rieger B, Jovin TM, Heintzmann R. Optics Express. 2005;13:7052–7062. [PubMed]

7. Cheezum MK, Walker WF, Guilford WH. Biophysical Journal. 2001;81:2378–2388. [PubMed]

8. Ober RJ, Ram S, Ward ES. Biophysical Journal. 2004;86:1185–1200. [PubMed]

9. Abraham AV, et al. Optics Express. 2009;17:23352–23373. [PMC free article] [PubMed]

10. Huang B, Wang W, Bates M, Zhuang X. Science. 2008;319:810–813. [PMC free article] [PubMed]

11. Zhang B, Zerubia J, Olivo-Marin JC. Applied Optics. 2007;46:1819–1829. [PubMed]

12. Thompson RE, Larson DR, Webb WW. Biophysical Journal. 2002;82:2775–2783. [PubMed]

13. Aguet F, Van De Ville D, Unser M. Optics Express. 2005;13:10503–10522. [PubMed]

14. Enderlein J, Toprak E, Selvin P. Optics Express. 2006;14:8112–8120.

15. Press W, Teukolsky S, Vetterling W, Flannery B. Numerical Recipes in C: The Art of Scientific Computing. 2 edn. Cambridge University Press; Cambridge: 1992.

16. Rasnik I, McKinney SA, Ha T. Nature Methods. 2006;3:891–893. [PubMed]

17. Lidke KA, Rieger B, Lidke DS, Jovin TM. IEEE Transactions on Image Processing. 2005;14:1237–1245. [PubMed]

18. Andrews NL, et al. Nature Cell Biology. 2008;10:955–963. [PMC free article] [PubMed]

19. Betzig E, et al. Science. 2006;313:1642–1645. [PubMed]

20. Rust MJ, Bates M, Zhuang XW. Nature Methods. 2006;3:793–795. [PMC free article] [PubMed]

21. Bates M, Huang B, Dempsey GT, Zhuang X. Science. 2007;317:1749–53. [PMC free article] [PubMed]

22. Egner A, et al. Biophysical Journal. 2007;93:3285–3290. [PMC free article] [PubMed]

23. Young I, et al. Depth-of-focus in microscopy.. In: Høgda KA, Braathen B, Heia K, editors. SCIA'93, Proceedings of the 8th Scandinavian Conference on Image Analysis; Norwegian Society for Image Processing and Pattern Recognition, Tromsø Norway. 1993.pp. 493–498.

24. Small A. Biophysical Journal - Biophysical Letters. 2009;92:L16–L18.

25. Wolter S, et al. Journal of Microscopy. 2010;237:12–22. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's Canada Institute for Scientific and Technical Information in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |