Motion artifact and speckle noise reduction in polarization sensitive optical coherence tomography by retinal tracking.

We present a novel polarization sensitive optical coherence tomography (PS-OCT) system with an integrated retinal tracker. The tracking operates at up to 60 Hz, correcting PS-OCT scanning positions during the acquisition to avoid artifacts caused by eye motion. To demonstrate the practical performance of the system, we imaged several healthy volunteers and patients with AMD both with B-scan repetitions for frame averaging and with 3D raster scans. Under large retinal motions with up to 1 mm amplitude at 0.5 ~a few Hz frequency range, motion artifact suppression in the PS-OCT images as well as standard deviation noise reduction in the frame averaged retardation images are presented.


Introduction
Since its first demonstration for measurements in biological samples [1], polarization sensitive optical coherence tomography (PS-OCT) has become one of the most important tools among the functional extensions of OCT that can assess more than the basic tomographic images of the reflected light intensity [2][3][4][5][6][7][8][9][10][11][12][13]. Accessing the polarization properties of the sample, PS-OCT is able to distinguish polarization preserving, birefringent, and depolarizing materials and tissues. In ophthalmic applications, which are among the most successful fields of OCT, the ability of PS-OCT enables researchers to analyze, e.g., the birefringent retinal nerve fiber layer (RNFL) [6,9,14,15] and retinal pigment epithelium (RPE) with depolarizing properties [10,14,16]. As this technology is now being used in an increasing number of clinical studies [17][18][19][20], including quantitative analysis of the patient imaging data [19,21,22], it is of increasing importance to obtain stabilized, undistorted tomographic images in both two and three dimensions in order to achieve optimum reproducibility and to enhance the quality of multiple frame averaged, noise reduced images [23].
In ophthalmic imaging, motion artifacts are inevitable due to involuntary eye motions [24]. The first combination of OCT and retinal tracking has shown increased stability and less fluctuation in registration for repeatedly acquired B-scans, which allows for better frame averaging in terms of avoiding edge and feature blurs [25,26]. A retinal OCT scanner with an integrated tracker is commercially available since 2007 and widely used to record highquality noise reduced averaged B-scan images [27][28][29][30]. For adaptive optics scanning laser ophthalmoscopy (AO-SLO), there has been more development to deal with small and fast tremor motions. Successful retinal motion detection on a micrometer scale by utilizing the magnified photoreceptor images was reported, which even enables tracking of single cone photoreceptors and targeted stimulus delivery [31][32][33]. Employment of such state of the art detection and tracking technology to OCT imaging was recently reported, where one of the reports described a functional OCT system used for vessel contrast and blood flow imaging [34,35].
In this paper, we present a novel PS-OCT system combined with retinal tracking functionality. The system consists of a PS-OCT and a line-scanning laser ophthalmoscope (LSLO) integrated together. Based on the images acquired by the LSLO, the system compensates for motion artifacts by adjusting the OCT scanner position in real time. The system is tested in healthy eyes under conditions of normal fixation (at a static target) and under conditions of large eye motions (induced by a moving fixation target). Essentially undistorted 3D and averaged, noise reduced 2D images are obtained. Residual motion artifacts are analyzed and reduction of retardation noise in the averaged B-scans is quantified. Finally, first results of patient imaging are shown, demonstrating the usefulness and the superior image quality achieved in a real clinical situation. Figure 1 shows the schematic diagram of the tracking PS-OCT, which is a combination of a spectral domain PS-OCT having two spectrometers, and an LSLO for the retinal tracking. The PS-OCT is based on a Michelson interferometer incorporating polarization-maintaining (PM) fibers similar to the setup described in ref [36]. A superluminescent diode (SLD) (fiber pigtailed, center wavelength 863 nm, FWHM 60 nm) is used as the light source, connected to a polarizer via polarization adjusting paddles, which are set to maximize the power of the linear polarized light output from the polarizer. A polarization maintaining fiber coupler with 90:10 splitting ratio is used to combine input, reference, sample, and detection arms of the Michelson interferometer. 10% of the input light travels to the sample arm and 90% of the light coming back from the sample arm is directed to the detection arm towards the spectrometers. A quarter-wave plate QWP1 in the sample arm is adjusted so that the polarization state incident on the sample is circular and the back coupled light to the fiber coupler has, in the case of a polarization maintaining sample, a linear polarization state oriented perpendicular to the original incident state [1]. The ocular scanning optical system has an optical aperture corresponding to 35° dia. field of view (FOV). The spot-size on the retina is about 20 μm. An x-y galvo-scanner pair (XY-scanner) raster scans the sampling beam over a field of 28° (x) × 21° (y) scanning angle, corresponding to 8 × 6 mm 2 FOV on the retina. The reference arm has a quarter-wave plate QWP2, which ensures a 1:1 power ratio for the horizontally and vertically polarized components of the reference beam travelling back to the detection arm. These polarization components are separated via a fiber polarizing beam splitter (PBS) and guided towards two separate spectrometers. At the spectrometers, each component is spectrally dispersed by a transmission grating and detected by a CMOS line sensor camera (2048 pixels, line-rate 70 kHz). One B-scan consists of 1024 A-scans, and one 3D volume acquisition records 250 B-scans in 4.5 s. The LSLO is designed to illuminate the retina with a line shaped light beam (8 mm × 20 μm) converted from a single mode fiber output of a pigtailed laser diode (LD) (wavelength 786 nm) by optics including a cylindrical lens. The line shaped beam is scanned perpendicular to the line at 1 ~60 Hz (variable) by a galvo-scanner (Y-scanner) so that it covers the same 8 mm (x) × 6 mm (y) FOV (on the retina) as that of the PS-OCT. A part of the optics between the scanner and the sample, which includes the focusing lens, is shared with the OCT. Reflected light from the sample is lead to a CCD line sensor camera (2048 pixels, line-rate up to 70 kHz; center 1000 pixels correspond to the 8 mm field) synchronized with the Y-scanner motion (600 lines acquired along y-direction). The power to the eye is 0.4 mW from the LSLO and 0.7 mW from the PS-OCT. This is below the safety limit for combined exposures drawn by International Electrotechnical Commission (IEC) [37]. The measured sensitivity of the PS-OCT is 98 dB near the zero-delay depth position.

Methods
The tracking is done by correcting actuations of the PS-OCT XY-scanner according to the offset signals based on the retinal position shifts detected by the LSLO. In order to extract the retinal position shift information, first, a template (typically consisting of 64 × 64 pixels) with a fiducial marker such as a thick vessel branch (of the subject retina) is chosen and extracted from the LSLO images recorded as a pre-processing step (Fig. 2). During the sequence of the data acquisition of the PS-OCT sub-system, the LSLO images taken in parallel are analyzed, so that the cross-correlation between the template image and each LSLO image is calculated. From the cross-correlation values, the best matched location is found, and the twodimensional shift information is obtained. This template matching calculation is done immediately after the data acquisition and image construction for each LSLO frame. A correction signal corresponding to the shift information is then sent to the PS-OCT XYscanner. According to the scheme described above, the timing of the actuation is only determined by the LSLO and is conducted at the same rate as that of the LSLO acquisition (i.e., the LSLO frame rate). The signal processing for the PM-fiber based spectral domain PS-OCT is described in ref [36]; from the pre-processed data, sample reflectivity (intensity), phase retardation (retardation), fast-axis orientation (axis-orientation), and the degree of polarization uniformity (DOPU) are calculated as previously described [5,16]. To exclude unreliable polarization state caused by noise, only pixels whose intensity exceeds a certain threshold are evaluated (three standard deviations above the mean intensity noise measured in the dark vitreous region), pixels below the threshold are displayed in gray color. Segmentation for depolarizing material is performed by thresholding the DOPU images; pixels with DOPU values lower than 0.75 are extracted.
In vivo measurements were performed in eyes of healthy human volunteers and patients diagnosed with AMD after informed consent was obtained. The study was approved by the university's ethics committee, and conformed to the Declaration of Helsinki for research in human subjects.
To evaluate the performance of tracking in healthy eyes, two types of fixation targets (FT) were used: a static fixation target (FT1) and a moving fixation target (FT2). Both targets consist of a white, blinking cross displayed on an organic light emitting diode (OLED) display projected onto the imaged retina via the ocular optics shared with the PS-OCT and the LSLO. In the case of FT2, the cross moves along a circle of 1mm dia. (on the retina) with a time period of 2 sec. Figure 3 shows an example of retinal motions detected by the tracking system for FT1 and FT2 in a healthy eye, where two dimensional motions are decomposed into x and y directions. For imaging of patient eyes, only the static target (FT1) was used.

Healthy eye imaging
In Fig. 4, a set of PS-OCT images processed from a 3D volume data set acquired in a healthy eye using the static fixation target (FT1) is shown. Figure 4  In a next step, we illustrate retinal tracking under conditions of large retinal motion (moving target FT2). For that purpose, we recorded series of 250 B-scans at the same nominal position (M-mode B-scans; Y-scanner of the PS-OCT sub-system kept at a stable position y (tracking-off) or just actuated to correct for retinal motions (tracking-on)). The residual motion artifacts can be seen in Fig. 5, under conditions of tracking-off ( Fig. 5(a)), tracking-on at 60 Hz ( Fig. 5(b)), and tracking-on at 20 Hz (Fig. 5(c)). The yellow and red lines in the pseudo SLO image shown in Fig. 5(d) roughly indicate the range of B-scan locations, where two thick vessels are included on the right side. In the ideal case of no motion or perfect tracking, the shadow of the vessels should lead to straight vertical lines in the M-mode B-scan images. Figure 5(a) (tracking-off) shows large deviations from straight lines with an amplitude of ~1 mm in x-direction, as expected. The closed elliptical shapes are vessel shadows caused by the thinner vessel near the red line in Fig. 5(d), which is nearly parallel to the x-direction.    Fig. 6(a) and 6(b), undulations of the vessels appear on the right side. Vertical shrinkage and stretch of the optic nerve head (ONH) and fovea, respectively, are observed as well, where the distortions are caused by motions in y-direction. Microsaccadic dislocations are also pronounced. In Fig.  6(c), the radial pattern of the axis-orientation around the fovea and the ONH observed in Fig.  6(f) and 6(i) is heavily deformed by the retinal motion. The radial axis-orientation around the fovea originates from Henle's fiber layer contribution to the birefringence, and it also causes the doughnut shape in the retardation map ( Fig. 6(e) and 6(h)) centered at the fovea [38][39][40][41]. The doughnut shape is also deformed severely in Fig. 6(b). The other radial axis-orientation distribution, observed around the ONH is caused by the RNFL. The thick nerve fiber bundles nearby the blood vessels emerging from and converging at the ONH generate increased retardation observed in the retardation map [14,38,40,42,43]. When the tracking was on during the 3D volume acquisition, these motion artifacts were strongly reduced. Even under conditions of large motion (FT2), the distortions are essentially eliminated (Fig. 6(d)-6(f)) leaving residual motion artifacts rather similar (as observed by visual inspection) to the case of the static fixation target (FT1), cf. Fig. 6(g)-6(i).
After this qualitative demonstration of retinal motions and residual motion artifacts, we wanted to analyze these parameters quantitatively under conditions of using fixation targets FT2 and FT1. Retinal motions in x and y directions, before motion correction, can easily be extracted from the LSLO images, which have x and y axes. On the other hand, detection of residual motions (i.e., after correction) requires multiple B-scan OCT frames that have the axis in the direction of motion detection; in this case, the system's fast-scanning axis was along x-direction and therefore only the detection of the x-component is available. We used the following method to calculate the residual motions in x-direction: (1) acquisition of 73 OCT B-scan frames as a sample, (2) selection of one reference frame, (3) calculation of crosscorrelation of the OCT intensity images between each frame and the reference frame, where two dimensional shift is made in x and depth (axial) directions, (4) determination of the shift between each frame and the reference frame according to the position of the maximum crosscorrelation value, (5) extraction of the x-component of the shift for further analysis. The third frame of each 73 B-scan frame set was selected to be the reference frame, unless it was found to be the case that, by a visual inspection, the B-scan image was distorted or had a dark shaded region, possibly by a microsaccade or a blink. For the analyzed frame sets in this study, no such case was found.  In Fig. 7, the retinal motions in x and y direction detected in the course of the tracking (i.e., by the LSLO template matching) and the residual motions in x-direction after correction by the tracker, as calculated by the method described above, are plotted for the cases FT2 and FT1. The standard deviation and range of the residual motion for each case are presented in Table 1. The maximum ranges of residual motions are more than 300 μm and more than 30  Fig. 7. When the corresponding frames are excluded by taking only non-outliers or the best 50 frames having the smallest residual motions in x-direction, the standard deviation is less than 20 μm and the range is less than 100 μm for the case of FT2. Similarly, for the FT1 case, the standard deviation and range are less than 7.8 μm, which corresponds to one pixel size, the detection limit for the cross-correlation calculation method used in this study.
In a next step we demonstrate the use of retinal tracking for generating high-quality PS-OCT B-scans of reduced noise by frame averaging. For that purpose, the following processing steps are taken for the PS-OCT data set acquired with the moving fixation target (FT2) with tracking-on. The first part of the procedure is in common with that for the residual motion calculation as described above as (1) -(4). Then, the next step is to calculate the average retardation values of the corresponding pixels in the shifted frames, which is conducted for series of N = 5, 10, 20, 30, 50 frames: (5) selection of the best N frames according to the maximum correlation value, (6) registration of each frame by two dimensional displacement according to the shift information, (7) averaging calculation of retardation based on the method described in ref [23], (8) exclusion of unreliable pixels from the further processing by averaged-intensity threshold (as explained in Section 2 above; excluded pixels are displayed in gray), (9) correction for the polarization state contribution from the anterior segment as described in ref [40]. The final step is to calculate the noise: (10) selection of an evaluation window, (11) calculation of the standard deviation of retardation within the window (a) in a single frame (the reference frame as above is used) and (b) in the series of averaged frames. The results are shown in Figs. 8 and 9. The retardation averaged image from 50 frames in Fig. 8(b), even though recorded under the large motion condition, shows reasonable preservation of the edge features, compared to the single frame image 8(a), e.g., the inner limiting membrane (ILM) including the foveal pit, detached parts of the vitreous (red arrows), external limiting membrane (ELM), etc. It can be also observed that the number of usable (i.e., non-gray) pixels has been increased, making the retardation image denser and smoother than the single frame image. Figure 9 further quantifies retardation noise reduction by the frame averaging. In Fig. 9(a), the retardation noise is plotted vs. the number of averaged frames. The evaluation window (containing 1000 pixels) was located in the RNFL as shown in Fig. 9(b). Noise reduction up to a factor of 7 is obtained for the 50 averaged frames.

Patient eye imaging
In order to demonstrate the usefulness of our system in a real clinical imaging situation, we recorded 3D and 2D averaged data sets in patients with age related macular degeneration (AMD) with geographic atrophies (GA). Patients with large central GA typically have very poor vision and therefore poor fixation capabilities. This poses heavy demands on the capabilities of a tracking system. Figure 10 shows results obtained in a patient eye with a large GA. For comparison, images obtained by a commercial near-infrared SLO ( Fig. 10(a)) and by fundus auto-fluorescence (FAF) (Fig. 10(b)) are shown. Increased infrared reflection from the choroid and decreased FAF signal at the atrophic region is observed. PS-OCT en face images are shown in Fig. 10(c) (intensity projection), and Fig. 10(d) (RPE depolarizing material thickness map and atrophic area segmentation overlaid as red lines). The yellow line in Fig. 10(c) indicates the location of the exemplary B-scan images, Fig. 10(f) intensity, Fig.  10(g) DOPU, and Fig. 10(h) depolarizing material segmentation by using the DOPU value and the depth information; the depolarizing material is divided into two regions: RPE (overlaid in red) and depolarizing material in choroid (in green) [19,21]. Corresponding retinal motion is shown in Fig. 10(e). It has similar amplitude range but much more frequent microsaccades as compared with the healthy eye looking at a moving fixation target (FT2) (Fig. 3(b)). Despite the large and frequent motions, the overall tracking works well, as can be seen by comparing the shape of the atrophic lesion in the PS-OCT segmentation image ( Fig.  10(d)) with that observed by FAF (Fig. 10(b)). Although the motion artifacts cause horizontal stripes in the PS-OCT projection images, the overall shape of the lesion is undistorted, which is an essential condition for quantitative evaluation of lesion sizes and for follow-up studies. In Fig. 11, results from another patient with GA are presented. In order to investigate the effect of the tracking, one acquisition with tracking-on and two acquisitions with tracking-off were performed. A commercial near-infrared SLO image is shown in Fig. 11(a) as a reference to the OCT en face images Fig. 11(b)-11(g). In the OCT en face images taken with trackingoff ( Fig. 11(c), 11(f) and 11(d), 11(g)), it can be observed that duplicated features appear at some locations such as those marked by yellow circles. They are caused by large microsaccadic jumps in y-direction ( Fig. 11(i), 11(j)). These jumps, opposite to the scanning direction of the Y-scanner, pull back the B-scan to the previously scanned locations, so that the features in these regions are scanned twice, and therefore duplicated. Since a duplicated part modifies the shape and area of the lesion, it generates an error in qualitative and quantitative analyses, while tracking helps to avoid such image deformations already in the acquisition step.  Figure 12 shows results of frame averaging performed in the same patient eye as shown in Fig. 10. A series of 83 B-scans were recorded at the same nominal position, and then the 50 and 10 frames with best cross-correlation results were selected and averaged. Averaged frames ( Fig. 12(a), 12(d) intensity, Fig. (12(b), 12(e)) axis-orientation, Fig. 12(c), 12(f) retardation) are compared with corresponding single frame images ( Fig. 12(g)-12(i)). The improved image quality of the averaged frames is clearly observed. Noise is strongly reduced and larger fraction of usable (i.e., non-gray) pixels are obtained, giving rise to a smoother appearance of the averaged images. While the difference between 50 and 10 frame averaging appear smaller compared with the difference to the single frame, it is clearly observed that there are further improvements in case of 50 frame averaging, especially in regions with relatively weaker intensities such as in inner retinal layers from ganglion cell layer to outer plexiform layer, and at around the choroid-sclera interface.
Interesting observations are a few narrow columns of uniform color in the axis-orientation image extending from the outer retina down to the choroid, which are located near the center of the fovea (red arrows) and related to the regions with increased retardation value, also extending in the vertical direction in the figure (i.e., axial direction). These indicate birefringence and are likely caused by ordered fibrous structures related to debris of photoreceptors, or cone photoreceptor axons from Henle's fiber layer.

Discussion
We have presented a new PS-OCT system with an integrated retinal tracker and demonstrated its use for in-vivo imaging. The system employs an LSLO as a retinal motion detector, and a simple scanner correction scheme. The motion artifacts in the PS-OCT images were successfully reduced with a tracking rate at 60 Hz. An analysis of the residual motion artifacts was performed by post-processing registration of M-mode B-scan frames. The system underwent a real clinical imaging situation, and demonstrated less distorted en face maps and high-quality frame averaged images with reduced motion artifacts and noise.
The performance of tracking depends on the actual delay time between the retinal motion and the correcting actuation of the PS-OCT XY-scanner. There are two main factors of the delay: one is the discreteness of the actuations and the other is given by the accumulation of the times for template matching and other processing, data communication, and hardware control (this latter delay factor, is called "process delay time" in this section). Since the system is not a closed-loop, but rather a one-way operation from the detection to the actuation, no complex analysis on the stability of the loop (such as using Bode plot and gain or phase margins) is necessary to predict the residual motion or to estimate the total delay time from the measured residual motions. For tracking at 60 Hz, our system has about 10 ms process delay time. The typical drift velocity ranges from 0.1 ~1.2 mm/s (e.g., Figs. 7(a), 7(c), 10(e), 11(h)-11(j)), which corresponds to the residual motion of 1 ~30 μm with a total delay time of 10 ~26 ms. The total delay time is explained by adding the process delay time (10 ms) and the discreteness at 60 Hz (~16 ms). When a significant jump caused by a microsaccade occurs, the system cannot track it immediately, however, after one cycle of tracking it recovers. In such cases, the PS-OCT basically misses only one B-scan for tracking, since the B-scan frame rate (56 fps) is slower than the tracking rate. This quick analysis explains the results in the M-mode B-scan projections in Fig. 5 and en face PS-OCT images well.
Speckle reduction by frame averaging of repeated B-scans at nearby positions has been discussed especially in relation with the distribution of the individual B-scans along the direction perpendicular to the B-scan (i.e., y-direction in this paper) [23,44,45]. In order to reduce the grainy appearance of the speckles, a wider range of the distribution is beneficial, while edges and detailed features with high spatial frequency will be lost. The optimal value of the range was suggested and demonstrated to be about 100 μm, and as low as ~15 μm if such high spatial-frequency features should be preserved. Although the distribution of repeated B-scans in y-direction (equivalent to residual y-motion artifacts) in this work cannot be directly measured, its range can be estimated from the residual motions in x-direction. From Table 1, we have a range of residual motions for the best 50 frames in a data set of ~40 μm for the moving target and < 8 μm for the static target for healthy eyes. From Figs. 7, 10(e), and 11(h)-11(j), we observe that the retinal motions have roughly similar range and behavior for x and y directions. Since the tracking system is rather symmetric for x and y directions (in terms of LSLO resolution and software that handles target detection and scanner control), we estimate the y-distribution range of B-scans to be also about 40 μm or less. While this is beneficial for preservation of edges and high-frequency features, for a maximum reduction of speckles, a wider y-distribution may be achieved by adding an intentional y-shift to the motion correction signal. Optimization of this parameter is planned for future studies. Even if the suppression of speckle noise is not optimized at present, nondeterministic random noise fluctuations such as those caused by shot noise and thermal noise can strongly be reduced.
A related point to be discussed is the number of frames that can be meaningfully averaged with and without retinal tracking. In a previous paper, we have demonstrated that averaged PS-OCT images with improved image quality can be generated without tracker in healthy subject with good fixation capabilities [23]. To achieve good image quality, averaging of 10 -50 frames (depending on signal strength) seems optimum (cf. ref [23]. and Figs. 9 and 12 of the present paper). With an OCT frame rate of 56 B-scans/s, this needs a time of 0.18 -0.9 s. In eyes with poor fixation quality, several microsaccades can occur per second. E.g., the eye imaged in Fig. 10 has a microsaccade every ~0.3 s, which would allow to record ~17 frames within one interval between microsaccades. In addition to the frequent microsaccades, strong drifts in y-direction can occur. In case of Fig. 10, the drift in y-direction is on the order of 1 mm/s. To keep the y-distribution of B-scans below 100 μm, those consecutive B-scans must be acquired within a time interval of 0.1 s, i.e., a maximum of 5 -6 frames can be averaged. If edges or high-frequency structures are to be preserved, the required time interval becomes shorter and averaging in such patients would not be feasible. This short analysis clearly underlines the advantage of retinal tracking, not only for 3D imaging but also for 2D imaging with frame averaging.
In conclusion, we have demonstrated, to the best of our knowledge, the first combination of PS-OCT and retinal tracking. In-vivo imaging results in healthy and patient eyes were presented. Motion artifacts in PS-OCT B-scan and en face map images were successfully suppressed by the tracking, and frame averaging of 50 B-scan PS-OCT data acquired with tracking was demonstrated. Comparison of PS-OCT en face maps with those of other imaging modalities demonstrated our system's capability to record essentially undistorted images of large atrophic lesions even in case of poor fixation compliance and large, frequent eye motions.