We describe an iterative algorithm that converges to the maximum likelihood estimate of the position and intensity of a single fluorophore. Our technique efficiently computes and achieves the Cramér-Rao Lower Bound, an essential tool for parameter estimation. An implementation of the algorithm on graphics processing unit hardware achieves more than 10

^{5}combined fits and Cramér-Rao Lower Bound calculations per second, enabling real-time data analysis for super-resolution imaging and other applications.In many single molecule fluorescence applications, it is often desired to find the position and intensity of a single fluorophore as well as to estimate the accuracy and precision of these parameters. Where accuracy is a measure of the systematic error or bias and precision is a measure of the statistical error of an estimator

^{1}. In recent work that uses single-molecule localization to generate super-resolution images^{2}^{-}^{6}, single emitters are located and on the mosaic of their found positions a two-dimensional Gaussian profile is placed to generate the final super-resolution images. The width of the placed Gaussian blob,*σ*, is given by the precision of the fluorophore position localization*σ*=*(σ*+^{2}_{x}*σ*and in these super-resolution techniques it is therefore necessary to both find the parameters and estimate their precision. Reported values are in the range of 20-70 nm. In the application of super-resolution imaging, it may be required to find the positions of more than 10^{2}_{y})^{1/2}^{6}fluorophores in order to generate one final image of a typical field-of-view of 50 × 50 μm. In many cases, background rates of light detection may vary across the field of view and the fluorophore emission rate of chemically identical fluorophores can vary due to effects such as uneven illumination profile, dipole orientation or different optical path lengths. In this work, we describe an iterative routine, implemented on a graphics processing unit (GPU) that calculates the Maximum Likelihood Estimate (MLE) for the*xy(z)*-position, the photon count of the fluorophore and the background fluorescence rate (Supplementary Note). We show that our approach achieves the Cramér-Rao Lower Bound (CRLB) over a wide range of parameters. The uncertainties of the fitted parameters are found by calculating their CRLBs^{1}, and in this sense the estimated*σ*for building up the super-resolution image is optimal. We provide a software tool (www.diplib.org/home22266) that only requires an inexpensive graphic card in order for single molecule fitting speed to be sufficient for real-time data analysis.Since the speed and precision of single particle localization has long played an important role in single particle tracking as well as in other single molecule biophysical techniques that rely on fluorescent reporters, others have also considered these issues. Several algorithms from the literature for finding particle positions are compared in

^{7}, but, without the context of a statistical framework. The theoretically best-possible estimation precision of a fluorophore position has been derived in^{8}by using the well established statistical method of finding the CRLB in an unbiased parameter estimation problem. They considered many of the effects in a real system including background fluorescence, finite camera pixel size, and camera readout noise and they recently made a software tool available for estimation (www4.utsouthwestern.edu/wardlab). Non-linear least-squares (NLLS) and MLE position estimates were compared to the CRLB in^{9}, and it was found that, in general, MLE gives better precision than NLLS. The better precision of MLE is in concurrence with our results, however in^{9}they were investigated with assumed known emission and background rates. Here we demonstrate a robust, iterative routine that finds the particle position, the intensity and the background count rate. We do not consider camera readout noise since for electron multiplying (EM)CCD cameras, which are generally used for the fast frame rates desired in super-resolution imaging, the readout noise is much less than 1 rms e^{-}when using large EM gain. As described further in the Supplementary Note, Supplementary Data and Supplementary Figs. 1-4, the method presented is not restricted to 2D imaging with a symmetric point spread function (PSF), but can be extended to handle super-resolution techniques that encompass astigmatic imaging for*z*resolution as in^{10}. In this case, the*z*position is also calculated directly (not from intermediate*σ*fits) and returned with CRLB based uncertainties. The results of the iterative algorithm compared to CRLB-based theoretical values are shown for a range of background rates and total collected photon counts of the PSF (Fig. 1). We show results for_{x}, σ_{y}*σ*_{PSF}= 1 with the size defined in unitless back-projected pixels. The diffraction limit for high NA visible light imaging is > 200 nm and*σ*_{PSF}> 90 nm^{11}. The algorithm both achieves and correctly reports the CRLB uncertainties over a wide range of background and fluorophore intensities. Calculated precision remains within a few percent of the theoretically achievable value even for less than 100 collected photons. We find that in all conditions, when the reported CRLB is less than*σ*_{PSF}/ 2 (here 0.5), the reported CRLB matches the theoretical position, and the routine achieves the CRLB. In typical super-resolution applications this corresponds to < 50 nm. Addition of camera readout noise has effectively the same negative influence on the parameter estimation as high background. Fortunately, this can be excluded for an EMCCD for the reasons mentioned above. Example images of single fluorophores with intensities and background rates near the*σ*_{PSF}/ 2 value are shown in Supplementary Fig. 5. The classical approach to solving the position fitting problem is via NNLS optimization (Fig. 1b). Here we chose a Levenberg-Marquardt (LM) optimization scheme with analytic and computed first derivatives with respect to the optimization parameters. Note that it is common practice to use computed derivatives only. The NNLS optimization clearly performs worse in terms of precision than our iterative MLE approach due mainly to the incorrect Gaussian-noise model implicitly present in any least-squares based optimization scheme. We compare the predicted uncertainty of the fit as in^{12}(Eq. 14) with the theoretical CRLB (Fig. 1c). Strikingly, they are identical for no background fluorescence but for any non-zero background and low light conditions the deviation is almost a factor of two. This means that in these cases the suggested uncertainty*σ*used to constitute the super-resolution image is estimated at nearly half the CLRB based value (overly optimistic). The formula presented in^{12}has the advantage that it can be readily calculated by hand from measurable quantities. However, for a precise estimate, the use of the reported CRLB from our iterative algorithm is preferred.Our iterative update scheme is similar to that described in

^{13}. We show, however, that a Gaussian approximation for the two-dimensional fluorophore PSF^{11}and following localization leads to a compact analytical expression that allows for computationally fast localization without compromising on the localization precision. Our approach achieves the CRLB after a few (~ 10) iterations. It should be noted that the CRLB predicts the correct precision only when the model function is correct, and the isotropic Gaussian model may not be appropriate when imaging fluorophores with a fixed dipole orientation^{14}, leading to anisotropic emission. As described in the Supplementary Data, a ‘rule-of-thumb’ fitting region size of 2 × 3*σ*_{PSF}+ 1 is used. For*z*resolution imaging the Gaussian PSF model is less reliable due to optical aberrations and the simplified model itself^{11}.GPU-based computation has the potential to increase floating point calculation speed by a factor of 10-100 as compared to a modern CPU if the problem is amenable to a parallel processing approach (www.nvidia.com). A generic C-like language interface is available for simplified GPU programming (Nvidia CUDA), and a MATLAB (The Mathworks, USA) interface has been developed. Operating with a fixed number of iterations complements the GPU's single instruction multiple data strategy (SIMD). A GPU implementation of our iterative method can perform 10

^{5}combined MLE and CRLB calculations per second of the four parameter model needed to describe the emittance of a fluorophore (Fig. 2).The CPU and GPU performance, characterized by the number of combined position fits and CRLB calculations performed per second, are shown in Table 1. The slowest GPU tested outperforms the CPU by more than one order of magnitude, with the fastest achieving 2.6·10

^{5}fits per second on a 7×7 fitting box size. We attribute this level of performance gain over the CPU to the fact that this estimation problem is almost ideal for the GPU SIMD architecture. Many iterations are performed on the same data, which are stored in local shared memory, and each thread is independent, eliminating synchronization delays. We also note that all CPU computation only ran on a single thread. In the right column of Table 1 the performance of a non-linear, least-squares fit (in C-code) on a CPU is shown for reference. It is twice as slow as our iterative algorithm on a CPU. Commonly used MATLAB LM optimization only computes about 5 fits/second. Given the readout rate of current high-end (EM)CCD cameras (~ 10 Mhz) our GPU implementation performs combined fits and uncertainty estimates at a speed sufficient for real-time analysis (see**Methods**).To summarize our findings, we have derived an iterative approach for making a maximum likelihood estimate of the position and intensity of a single fluorophore as well as the background count rate using a two-dimensional Gaussian PSF model and a Poisson noise model. The iterative method achieves the minimal possible estimation uncertainty, as given by the Cramér-Rao Lower Bound, over a wide range of emission and background rates that could be found in single molecule experiments. Implementation of the iterative method on a modern graphics processing unit yields more than 10

^{5}combined fits and CRLB calculations per second, greatly facilitating the analysis of large data sets found in single molecule based super-resolution techniques and is suitable for real-time data analysis.