|Home | About | Journals | Submit | Contact Us | Français|
Abstract: Advances in swept source laser technology continues to increase the imaging speed of swept-source optical coherence tomography (SS-OCT) systems. These fast imaging speeds are ideal for microvascular detection schemes, such as speckle variance (SV), where interframe motion can cause severe imaging artifacts and loss of vascular contrast. However, full utilization of the laser scan speed has been hindered by the computationally intensive signal processing required by SS-OCT and SV calculations. Using a commercial graphics processing unit that has been optimized for parallel data processing, we report a complete high-speed SS-OCT platform capable of real-time data acquisition, processing, display, and saving at 108,000 lines per second. Subpixel image registration of structural images was performed in real-time prior to SV calculations in order to reduce decorrelation from stationary structures induced by the bulk tissue motion. The viability of the system was successfully demonstrated in a high bulk tissue motion scenario of human fingernail root imaging where SV images (512 × 512 pixels, n = 4) were displayed at 54 frames per second.
Fourier domain optical coherence tomography (FD-OCT) is becoming a well-established high-resolution biomedical imaging modality with numerous pre-clinical and clinical applications, such as real-time assessment of optimal coronary artery stent placement and as a supplement in lymph node biopsies [1–3]. Swept source OCT (SS-OCT) uses tunable lasers to sweep the wavelength of the light to interrogate the tissue, thus allowing the reconstruction of a depth-resolved axial scan (A-scan or A-line) . Owing to the continuous advancement of swept source laser technology, A-scan rates exceeding multi-megahertz have been demonstrated . These fast imaging speeds are beneficial for functional imaging techniques, such as speckle variance (SV) analysis for microvasculature detection, where interframe motion can cause severe imaging artifacts and loss of vascular contrast.
Speckle variance identifies microvasculature by calculating the interframe intensity variance of structural images, providing a non-invasive high sensitivity imaging technique capable of visualizing capillaries without the use of exogenous contrast agents . However, practical application of using SV, including the presentation of real-time diagnostic information, is hindered by the computationally intensive signal processing required by structural and SV calculations when implemented on a serially oriented central processing unit (CPU). In comparison, commercially available graphics processing units (GPUs) are optimized for parallel data processing, and have recently been adapted to accelerate processing speeds of computationally intensive algorithms such as spectrometer-based FD-OCT [7–12]. While SV has been used to identify microvasculature for many studies, the added ability to image in real-time would open SV to assist in interventional procedures such as stroke, spinal cord injury (SCI), age-related macular degeneration (AMD) and oncology.
Here, we describe a high-speed SS-OCT imaging platform capable of real-time SV processing, display and saving at 108,000 A-scans per second. The system utilizes a polygon-based short cavity ring laser at 54 kHz, where buffering and amplification is performed to double the A-scan rate of the laser . The complete SS-OCT signal processing including λ-to-k spectral resampling, fast Fourier transform (FFT), post-FFT (structural and SV) processing, and display were implemented on a GPU. In addition, subpixel image registration of structural images was performed prior to SV calculations to reduce decorrelation from stationary structures induced by the bulk tissue motion. To the best of our knowledge, this is the first demonstration of real-time high-speed functional SS-OCT, where structural images were processed and displayed at 216 (512 × 512 pixels) frames per second (fps) along with the processing and display of n = 4 averaged structural and SV images at 54 fps. This enables real-time imaging of microvasculature and allows SV to be used as an interventional guide to procedure. While the capabilities of SV OCT as an imaging tool has been demonstrated many times in the past, the added function of providing real-time information extends this technique to new applications both in preclinical and clinical settings.
Figure 1 shows the system configuration, which consists of a swept source OCT system and a custom-built personal computer. The swept source laser design employed a buffered polygon ring cavity laser. A 600 lines/mm diffraction grating (GR25-0613, Thorlabs) was selected to create the necessary dead time between laser sweeps and enabled buffering from 54kHz to 108kHz . The laser had a total output power of ~50mW, a coherence length of 3mm, and an axial resolution of 9μm in tissue. A Mach-Zehnder interferometer (MZI) was used to interrogate the biological samples, while standard techniques for triggering with a fiber Bragg grating (FBG) and recalibration with a MZI clock were used in this system . The laser beam scanning was implemented using a pair of galvanometer mirrors controlled by an analog output card (PXI-6259, National Instrument), where the fast-axis sweep was driven by the FBG A-scan trigger and the slow-axis sweep was controlled by the software.
A custom-built personal computer housed high-end consumer-level electronic components that were specially selected and configured to enable continuous real-time acquisition, processing, and display of both structural and speckle variance data. Recalibrated fringe data can be simultaneously saved at an A-scan rate of 108kHz, taking full advantage of the laser sweep rate. The analog signals were digitized at 250MS/s by a 12-bit data acquisition card (ATS9350, Alazartech), and transferred frame-by-frame (B-scan) to the computer memory via PCIe x8 interface. The raw and recalibrated data was copied to and from the memory of a GPU (GeForce GTX 460 1GB, NVIDIA) at 3.0GB/s over PCIe x16. The final structural and speckle variance images were rendered and displayed directly off the GPU memory. Concurrently, the recalibrated data was streamed to a solid-state drive (SSD) (RevoDrive 3 X2, OCZ) at 1.3GB/s through PCIe x4.
The custom-built software was developed with Microsoft Visual C++ 2008 and the Alazartech software development kit (SDK), OpenGL, and National Instrument NiDAQmx application programming interface for acquisition, display, and galvanometer control, respectively. For signal processing, the GPU was programmed using NVIDIA’s compute unified device architecture (CUDA) toolkit and SDK 4.0 . Custom CUDA kernels, in combination with built-in CUDA libraries, were used to manipulate and reconstruct structural and speckle variance OCT images on the 336 cores of the GPU. Utilizing the GPU to perform massive parallel data processing, not only dramatically enhances the data throughput when compared to processing using only the CPU, but also concurrently frees up CPU resources. Another advantage for processing data directly on the GPU is that the reconstructed OCT images can be rendered and displayed without any additional memory transactions.
To minimize dead time between frame captures, tasks were divided into four threads, as depicted in Fig. 2 , for data acquisition (Thread 1), processing and saving (Thread 2), image display (Thread 3), and slow axis galvanometer control (Thread 4). The four threads were synchronized in a cascaded producer-consumer paradigm, where Thread 1 triggered both Thread 2 and 4, and Thread 2 triggered Thread 3 on every B-scan acquired and processed, respectively, as indicated by the dashed arrows. A circular memory buffer and two semaphores (flags to prevent simultaneous reading and writing to the same memory address) were used between Thread 1 and 2 to regulate data flow from the motherboard to the GPU. The solid arrows show the data flow through each hardware or processing kernel. In Thread 2, a MZI clock for A-scan recalibration was pre-stored on the GPU memory, whereas the average A-scan intensity of a B-scan could be updated on the fly to remove the DC noise. A Hanning window and zero padding were applied to each A-scan followed by fast Fourier transform (FFT). Either structural or averaged structural plus speckle variance OCT images were calculated and finally displayed by Thread 3. Any data that resided on the GPU memory could be continuously copied back to the host memory and transferred to the SSD for storage. The recalibrated (DC component removed) data were saved for post processing and 3D visualization purposes.
Bulk tissue motion (BTM), such as cardio-respiratory motion, cannot be avoided during in vivo imaging, and this results in a lower vascular contrast in the SV images. A subpixel image registration algorithm  was adopted and implemented on the GPU using custom CUDA kernels and built-in CUDA libraries. The aim was to reduce the adverse effects of the BTM by digital realignment of the structural images prior to the SV calculation. The algorithm assumes 2D rigid translation between two images, and first obtains an initial estimate of whole pixel shifts using a traditional 2× upsampled FFT cross correlation technique. Subpixel shifts are then calculated using a single-step discrete Fourier transform (DFT) algorithm, where an upsampled cross correlation (by a user-defined upsampling factor, κ) is computed in 1.5 × 1.5 pixel neighborhood about the initial estimate. The use of DFT greatly reduces the memory and computational requirement when compared to traditional FFT approach.
We have previously demonstrated SV in a low BTM scenario by imaging mouse dorsal skinfold window chamber, where small capillaries could be detected with relatively slow image speed of 36kHz by optimizing the SV parameters (i.e. gate length, n, or the number of frames used to calculate the SV, and frame rate, F) . In this study, the viability of the fast imaging system (108kHz) was evaluated in a high BTM scenario by in vivo imaging of non-stabilized human fingernail root on a healthy volunteer. B-mode scanning and real-time display of averaged structural OCT and SV image were examined first. Each B-scan consisted of 512 A-lines spanning over 2mm. Each A-scan had 2096 and ~364 samples before and after recalibration, respectively, which was zero padded to 1024 prior to the FFT. Subpixel image registration was performed on the top center region (256 × 256) of the structural image with an upsampling factor of 100. For calculating SV in a high BTM case, a small gate length of either 2 or 4 and a fast frame rate are desired to obtain a high SV signal-to-noise ratio (SNR) . While the A-line density fixed the frame rate at 216fps, the software allowed for a real-time update of the gate length between 2 to 16. A gate length of 4 was chosen to reconstruct the averaged structural OCT and SV images, as shown in Fig. 3 , at an effectively display rate of 54fps. The galvanometer sweepback is seen on the right of each image. Layers of tissue morphology could be delineated easily in the structural image, whereas microvasculature appears as high contrast regions in the SV images. Figure 3(a) (Media 1) and 3(b) (Media 2) compare the SV image quality with and without subpixel image registration realignment, clearly showing less bulk tissue signal and higher vascular contrast in the realigned image.
With the same B-mode imaging parameters as above, the software performance was analyzed using the CUDA’s built-in visual compute profiler 4.2, which provided detailed timing statistics of each kernel executed on the GPU as shown in Fig. 4 . The longest possible processing path was considered by including all data transfers to and from the GPU, subpixel image registration, and SV calculation on the nth frame. The GPU times were the average of 100 kernel calls, and the total processing time per SV image was ~3.02ms. This processing speed can theoretically support a 169kHz SS-OCT system at 2096 samples per A-scan. The most time-consuming step was the image registration algorithm, which took 1.6ms to complete. Compared with a CPU implementation reported by Guizar-Sicairos et al.  which employed the same algorithm (image size of 256 × 256, and an upsampling factor κ = 25), our GPU implementation with κ = 100 is two orders of magnitude faster. SV calculation and data transfer to the GPU took 388μs and 372μs, respectively, making these tasks the second and third most time-consuming step. The GPU time for the SV calculation is independent of the number of frames used. This is because the mean structural OCT and the sum of squares are calculated as each frame is being acquired, and on the nth frame the same number of arithmetic operations are performed to produce the SV image. It is worthwhile to note that the structural image calculation only took 195μs, and for normal structural OCT imaging, where image registration is not essential, the processing time becomes ~1.23ms, which can theoretically support a 416kHz system. Approximately 30MB of GPU memory was allocated to process each B-scan of 2.15MB in size. Such low memory requirement allows for this parallelized implantation through standard, commercially available GPUs. Finally, each recalibrated B-scan contained ~745kB of data and took on average 573μs to transfer to the solid state drive. This data transfer is relatively time-consuming and will require its own dedicated thread to hide this latency for larger data sizes.
To test the 3D SV imaging capability the real-time data saving function, 10,000 B-scans (512 A-lines in each B-scan) of recalibrated data covering a 2mm-by-2mm region were saved for post-processing. Data transfer did not degrade the software performance, which provided the shortest time delay between adjacent frames, and hence the higher SV SNR. In our previous study , a 50% decrease in variance signal was observed when the frame step size was the same as the beam spot size. Here, the large number of frames corresponded to a 0.2µm step size, which was 65× smaller than the beam spot size of 13µm.
In post-processing, the same steps were followed as in real-time processing. Figure 5(a) shows an en face 2D SV projection image in which the high contrast regions closely resemble capillary loops commonly observed in nailfold capillaroscopy . The projection image is a summation over 860μm starting from 310μm below the tissue surface. Motion artifacts due to heartbeat and breathing can be seen as periodic horizontal striations (indicated by the white arrows) resulted from an increase of the noise floor of the entire SV image and adjacent frames. Figure 5(b) shows the projection image after the structural images are realignment using the image registration algorithm, which improved the image quality via the removal of the periodic striation noise.
In using image registration to realign the structural data, it was assumed that the motion was non-deformable and had no rotational components. It was further assumed that out-of-plane motion was limited due to our relatively high imaging speed and because there is no available method to account for this type motion. The cross correlation algorithm revealed subpixel and pixel size movements in the lateral and axial direction, respectively, in all adjacent B-scans. This corresponded to a <10-20 micron shift between frames. Our imaging speed of 216fps was sufficient in capturing this motion smoothly and the resultant change in the speckle pattern, allowing the image registration algorithm to improve the image quality. In an application where motion in 3D space is equally likely, the requirement for faster image speed becomes ever more stringent for clinical applications.
SV and related methods have assisted functional magnetic resonance imaging in measuring functional hyperemia and cerebral blood flow in rat models [18,19]. This combination helps to identify pathological changes in stroke, Alzheimer’s, and brain injuries, but traditionally long processing times limit SV to preclinical studies. The ability to image microvasculature in real-time would allow SV to move beyond stroke research and perhaps guide interventional procedures for stroke patients as changes in microvasculature during and prior to treatment highlight progression of the effects of stroke. Another potential application of a high-frame rate SV SS-OCT system includes the intra-operative evaluation of damaged vasculature after spinal cord injury.
In addition, microvasculature in the retina and the choroid have been mapped by SV for the study of retinal diseases, specifically diabetic retinopathy and AMD [20,21]. One treatment for AMD is the use of anti-vascular endothelial growth factor (VEGF) drugs, which reduces VEGF responsible for stimulating growth in new blood vessels . In this case, the ability of real-time SV imaging could be used to help guide intra-vitreal injections of anti-VEGF drugs and subsequently monitor changes in the microvasculature.
Finally, microvasculature imaging is also important for the identification and treatment of various types of cancer. Tumor growth often results in increased blood flow to cancerous regions and thus certain treatment methods focus on the disruption of the tumor microvasculature . Whereas previous imaging of microvasculature only provided before and after information of the treatment, functional imaging with SV has the potential to show changes during treatments such as photodynamic therapy .
In summary, a real-time high speed SV SS-OCT imaging system using an NVIDIA GPU was demonstrated. An A-scan rate of 108kHz was fully utilized, allowing structural images (512 × 512 pixels) to be acquired, processed, and displayed at 216 fps, and SV images at 56 fps, which demonstrated excellent flow sensitivity capable of distinguishing microcirculation from the non-stabilized bulk tissue in a human fingernail fold. In addition, subpixel image registration algorithm digitally reduced the BTM and significantly improved the SV image quality. Whereas SV has been limited in the past by processing times to preclinical studies, real-time SV opens new potential applications include guiding interventional procedures for stroke, SCI, AMD, and oncology. In these cases, microvascular changes can be monitored without the implementation of gating or post-processing. Our custom-built software has been designed to be versatile, such that it can work for both non-linear and linear swept source laser systems. On-going research includes combining our software with higher resolution and faster OCT systems to achieve real-time 3D microvasculature display that may be packaged with existing interstitial or catheter technology to provide real-time in vivo microvasculature rendering of deeply situated organs.
The authors acknowledge funding support from the National Sciences and Engineering Research Council of Canada, Early Research Award Ontario, and Mitacs. In addition, we would like to thank Mr. Muneeb Khalid, President. Alazar Technology Inc., for the continuing technical support.