Crosstalk-free volumetric in vivo imaging of a human retina with Fourier-domain full-field optical coherence tomography

: Fourier-domain full-ﬁeld optical coherence tomography (FD-FF-OCT) is currently the fastest volumetric imaging technique that is able to generate a single 3-D volume of retina in less than 9 ms, corresponding to a voxel rate of 7.8 GHz. FD-FF-OCT is based on a fast camera, a rapidly tunable laser source, and Fourier-domain signal detection. However, crosstalk appearing due to multiply scattered light corrupts images with the speckle pattern, and therefore, lowers image quality. Here, for the ﬁrst time, we report on a system that can acquire essentially crosstalk-free volumes of the retina by using a fast deformable membrane. It enables the visualization of choroids and a clear delineation of the retinal layers that is not possible with conventional FD-FF-OCT.


Introduction
Optical Coherence Tomography (OCT) is an invaluable tool in eye imaging. Scanning confocal OCT is the most established OCT technique that is capable of acquiring images from deep in tissue. It is based on a flying-spot scanning and coherent detection through a confocal pinhole, such as a single-mode fiber. Conventional Full-Field OCT (FF-OCT) has some advantages over the scanning confocal OCT, like increase in speed and/or cost reduction. Specifically, it enables spatially parallel, fast 2D image acquisition by utilizing a camera in conjunction with an inexpensive, spatially and temporally incoherent light source, such as LED or a thermal source [1][2][3]. Due to its operation in time-domain (TD), we will call this technique TD-FF-OCT in this article to distinguish it from a Fourier-domain variant -FD-FF-OCT, defined below. However, in order to generate 3D volumes, a sample in TD-FF-OCT needs to be translated axially. On the other hand, high-NA objectives can be used with the axial scanning that allows achieving better than 1 µm isotropic resolution [4][5][6][7], whereas in most of other OCT techniques the lateral resolution has to be balanced against the depth-of-field (DOF) [3,8]. Recent developments in fast and high full-well-capacity (FWC) cameras significantly speeded up FF-OCT [9] -to the level of in vivo imaging -including that of the cornea [10] and retina [11], as well as ex vivo imaging of the dynamic processes in the eye [12]. Despite the fast 2D data acquisition, TD-FF-OCT remains relatively slow technique when it comes to the volumetric (3D) imaging. There is speed and sensitivity advantage in the scanning and full-field configurations if the signal is recorded in the Fourier-domain (FD) rather than time-domain [13][14][15].
Fourier-domain FF-OCT (FD-FF-OCT) utilizes a rapidly tunable laser source, usually referred to as a swept-source, instead of the spectrally broadband incoherent light source in TD-FF-OCT. Fourier-domain signal detection and its parallelization by a camera enables ∼10 GHz voxel rate [16], making FD-FF-OCT the fastest volumetric OCT technique with relatively high sensitivity. Even though the same volumetric imaging speed is possible with TD-FF-OCT [17], however, the sensitivity is lower. Another important full-field illumination benefit is that a significantly higher retinal exposure is allowed compared to the scanning OCT [18]. In addition, phase that in FF-OCT when added before the interferometer because the illumination does not have to be spatially coherent as it does in confocal microscope. For example, TD-FF-OCT that achieves highest spatial resolution (∼1 µm in 3D) uses LED, which is completely spatially incoherent.
Here, we report on in vivo crosstalk-free imaging of human retina that was achieved by using the deformable membrane, as well as a supplementary optical focusing system that facilitated the axial alignment of the retina, reference mirror and DM planes on the camera. We thus were able to show that 3D volumes, acquired in 258 ms and averaged to increase signal-to-noise ratio (SNR), had a significantly improved image contrast when the spatial coherence of the laser was destroyed with the DM for crosstalk-free imaging. Specifically, we were able to see a more detailed choroid structure, which is essential for understanding many retinal and choroidal diseases [47]. Choroids are otherwise buried behind the crosstalk noise when imaged with the conventional FD-FF-OCT. The other outer retinal layers, like IS/OS and RPE, were also more contrasted with crosstalk-free imaging. A better delineation of the retinal layers were also observed.

Setup
The system, shown in Fig. 1, consisted primarily of a fast-tunable laser source, a deformable membrane (DM), a Linnik interferometer and a fast camera. A tunable laser source (Broadsweeper BS-840-2-HP, Superlum), capable of delivering 25 mW average output power, could be tuned from 800 nm to 878 nm at a sweeping speed of up to 100 000 nm/s. The light was delivered to a setup by a single-mode fiber where it was collimated to a diameter of 1.1 mm full-width-half-maximum (FWHM). The beam was then reflected off by the DM (Dyoptyka), which was able to rapidly generate standing wave patterns at a rate of up to a half a megahertz that were largely uncorrelated. DM was based on a thin highly reflective membrane that could be excited with an actuator at a range of frequencies leading to formation of surface standing waves. DM was angled such that the orientation of its normal was close to the optical axis, which made the DM image sharp across the whole retina field of view. For that, long focal length relay lenses L1 (f = 15 cm) and L2 (f = 12.5 cm) were used to reimage DM on the plane conjugate to a sample plane (S' in Fig. 1). Placing DM directly at the plane S' would have required a significant DM tilt because of a short focal length of the L3 lens. Upon the reflection, a phase pattern was imprinted by DM on the beam's wavefront. When the DM was not active (OFF) it acted as a simple mirror, whereas active DM (ON) acted as a dynamic random diffraction grating -diffracting the beam into a range of angles that quickly changed in time. An image of the diffraction pattern captured in plane P' in Fig. 1 showed a homogenous intensity distribution from which DM's scattering properties were calculated. Since the spot size to which the lens L1 focused the diffracted beam was 1.23 mm in FWHM, the FWHM of the scattering angles was estimated to be 0.47°. The spot size in the P' plane was ∼90 µm when DM was acting as a mirror (OFF). The spot was further relayed on a pupil plane of an eye with L2 and L3 (f = 3 cm) lenses, de-magnifying the spot 4.2 times, which for the active DM resulted in 300 µm in diameter (FWHM) and for non-active -∼20 µm. The 50/50 beamsplitter splitting the beam into the reference and sample arms was rotated in-plane by a small angle, as shown in Fig. 1, to avoid specular reflection from the beamsplitter going on to a camera. The reference arm contained an objective lens L4 (f = 3 cm), a mirror M1 and a neutral density (ND) filter that attenuated the reference beam to ∼6% in the double-pass configuration. Lens L4 and optics in human eye formed Linnik interferometer. It helped to match the chromatic dispersions in the reference arm to that of the sample arm. Also, the lens is necessary in spatially incoherent case, when DM is ON, since the reference mirror has to be imaged on the camera as explained in section 2.2. When DM was OFF, the human eye collimated the beam going onto the retina to an estimated diameter of 0.850 mm (FWHM), assuming that the focal length of the human eye lens was 2.5 cm. Switching DM ON resulted in a multitude of collimated beams impinging on the retina at a range of angles defined by DM properties and imaging optics. Crosstalk-free Fourier-domain full-field OCT system for in vivo retinal imaging of the human eye. L1-L7 -achromatic doublet lenses; M1-M4 -mirrors; DM -deformable membrane; ND -neutral density filter; P' -plane conjugate to the pupil plane; S' -plane conjugate to the sample plane and also to DM plane. Lenses L3, L4, and L6 are mounted on the translation stage, TS. Lens L5 is not mounted on TS. Red beam shows spatially coherent beam (when DM is OFF) and green beam depicts a spatially incoherent case (when DM is ON). To simplify the diagram, no scattered light from the retina is shown in the detection path -only specular reflections in the coherent as well as the incoherent cases. One can also see that a beam is no longer focused to a spot in the P' plane when DM is ON.
Backscattered light from the retina and reflected light from the reference arm was recombined by the beamsplitter and imaged on a camera through a pair of relay lenses, L5 (f = 3 cm) and L6 (f = 5 cm), and a tube lens, L7 (f = 30 cm), all in a 4-f configuration. An interference image was detected by a fast camera (Fastcam SA-Z, Photron) that could record 1024 × 1024 images at 20000 frames-per-second or smaller regions faster. The pixel size was 20 µm. It allowed collecting 12-bit images at a rate of up to 21 GS/s. The magnification in the detection path was estimated to be around × 10 when imaging human eye in vivo, which resulted in a sampling rate of ∼2 µm on the retina. The estimated axial resolution was 4.5 µm. The measured sensitivity of the FD-FF-OCT system with the DM switched on exceeds 90dB for 20 averaged volumes.

Defocus compensation with the 3-lens system
An optical subsystem of the whole system shown in Fig. 1 was built to compensate for the defocus appearing due to the refractive error or chromatic aberrations inside the human eye. It is shown separately in Fig. 2. The subsystem was able to focus retina on the camera, while at the same time keeping the reference mirror and DM in focus by moving a set of lenses axially. In the spatially coherent illumination case, when DM is OFF, the requirement to conjugate DM with the retina is relaxed, and thus, the alignment procedure is generally simpler. However, activating DM will destroy the OCT signal if DM is not properly imaged on the reference mirror and the retina. To explain this, it helps to think about DM as a device that takes a flat wavefront of the beam and creates multiple spatially coherent areas that are incoherent with each other. Each of those areas could be thought of as an independent light emitter of a certain spatial size that, in turn, could be collimated to separate coherent beam, as shown in Fig. 2, where one such beam is depicted to emanate in the center of DM. Each of such beam can, in turn, be focused to a spot on the retina if the eye does not have any refractive errors, as shown in Fig. 2(a). Each beam will have a certain DOF when focused, defined as λ/NA 2 il , where NA il is the numerical aperture of illumination. Thus, the beam can be considered to be focused on the retina when the retina is within the beam's DOF. When the retina is imaged on the camera, each of those mutually incoherent spots will also act as virtual pinholes, rejecting crosstalk computationally, since the light coming from the different coherent spots due to the crosstalk will not interfere. Fig. 1 showing light propagation in the 3-lens system when DM is ON in (a) normal and (b) hyperopic eye. The beam path shows light that originate from a single coherence volume in the center of DM, which is shown in Fig. 1 and is conjugate to S' here. To compensate for the refractive error ∆z in (b), the three lenses need to be moved by ∆z to the left. Light coming out of the 3-lens system is collimated in (a) and (b), and thus, will be focused to a spot on the camera by the tube lens L7 in Fig. 1.

Fig. 2. Subsystem of
In a similar manner, the DOF could be considered as a confocal gate since the interference will be greatly reduced between the light coming from the outside of DOF in the sample arm and the light coming from inside of DOF in the reference arm. We refer to this effect as to Spatial Coherence Gating (SCG) [22]. When DM is not active (OFF), its surface is almost flat and the beam impinging on the retina and on the reference mirror is close to being collimated, as represented by the red beam in the Fig. 1. In such a situation, the axial position of DM and the reference mirror is not crucial since the beams will interfere on the camera over a wide axial range. When the beams are collimated, the one reflected by the reference mirror will remain collimated and the interference on the camera will not lose intensity even with large axial translations of the reference mirror. This is provided that the beams have a sufficient temporal coherence length (roll-off). In contrast, to keep the sample and reference beams interfere on the camera when they are spatially incoherent, the position of the reference mirror can be varied only within a narrow axial range, defined by DOF. To summarize, the phase randomization carried out by DM requires its more careful axial alignment with the retina, the reference mirror, and the camera, and thus, a more controlled way of the axial alignment becomes necessary. To compensate for the defocus introduced by, for example, a myopic eye, which has elongated eye ball, a tube lens could be moved towards the camera in the Linnik interferometer. However, it would slightly change the magnification and in the spatially incoherent case (when DM is active) it would also defocus images of the reference mirror and DM on the camera. To bring the focuses back, an objective lens in the reference arm (L4 in Fig. 1) would have to be translated towards the reference mirror and a lens in the illumination path (L3 in Fig. 1) -towards DM. The three lenses can be put on the same translation stage for simultaneous translation if they all have to be translated by the same distance. However, if the focal lengths of the two lenses are close to that of the eye's (∼2.5 cm), they need to be translated axially only by the distance ∆z, to compensate for the defocus of ∆z in the eye, whereas the tube lens -by M 2 ∆z, where M is the lateral magnification. In order to enable imaging for M > 1, we have introduced two relay lenses L5 (f = 3 cm) and L6 (f = 5 cm) in the detection path so that moving L6 by ∼∆z would compensate for ∆z defocus in the eye. Since all the 3 lenses have to be moved by the same distance ∆z, it enables putting them on the same translation stage, as shown in Fig. 1. The subsystem of the three-lens system allowed moving dynamically the plane of focus through the retinal layers without changing the magnification. Most importantly, it kept DM and retina conjugate, as well as the reference mirror and the camera. While other designs were possible, this one kept the reference and sample arms as short as possible to make the whole interferometry more stable.
To demonstrate the performance of the 3-lens system in correcting for the defocus we have acquired OCT images of a resolution target, shown in Fig. 3. A varying degree of defocus was introduced by translating the objective lens in the sample arm, which at the same time kept temporal alignment. The focal length of the lens was chosen to be f = 2.5 cm so that together with the resolution target it would be a close optical approximation of a human eye. To illustrate the principle of refocusing with the 3-lens system, we have defocused the target by moving the objective lens and then refocused it with the help of either the 3-lens system or a tube lens. For example, Fig. 3(a) shows the OCT image of the resolution target in focus and Fig. 3(b) shows it defocused by 760 µm. Then Fig. 3(c) shows the target refocused with the 3-lens system and Fig. 3(d) -with the tube lens. One can see that the tube lens cannot refocus to the initial OCT signal, which can be explained by the axial misalignment happening between the sample and the reference beams, resulting in 2.4 × OCT signal drop. We have plotted the OCT signal drop for other sample defocus values in Fig. 3(e). For those measurements, the lens in the sample arm was defocused and subsequently compensated by either translating the 3-lens system (red curve) or the tube lens (blue curve with circles). We can see that the 3-lens system could successfully compensate any defocus in the range from 0 to 2.1 mm introduced in the eye model, whereas compensating defocus with the tube lens will decreases the OCT signal progressively with defocus. The plot shows that the OCT signal drops to half of its maximum value when the defocus of 0.68 mm in the sample arm is compensated by moving the tube lens (by ∼6.8 cm). Thus, the FWHM value of the curve in Fig. 3(e) is 1.36 mm, which also corresponds to the DOF and the width of the SCG. re-focused back with the 3-lens system and (d) re-focused back with the tube lens. (e) OCT signal as a function of defocus correction with the 3-lens system (red) and the tube lens (blue curve with circles). The FWHM value of the 'tube lens' curve was measured to be 1.36 mm.

Spatial coherence gating (SCG)
To study the SCG effect produced by the deformable membrane, we replaced the resolution target with a silver mirror in the system described in section 2.2. Interferometric images were recorded with a spectrally narrowband illumination source that was provided by the swept-source operating in a fixed-wavelength, non-sweeping mode. Newton rings were formed in the images, shown in Fig. 4, because of the focal length differences between the objective lenses in the sample and the reference arms (2.5 cm and 3 cm, respectively). Generally, the Newton rings develop when two spherical wavefronts of different radiuses interfere on the camera. The rings appears in Fig. 4 because the objective lens L4 in the reference arm could not be put at the position where it would form 4-f configuration with the relay lens L5, as does the objective lens in the sample arm. Arranging both arms in perfect 4-f configuration would result in 1 cm optical pathlength difference between the two arms because of the focal length differences in the objective lenses. The optical pathlength difference would prevent interference to occur in case of perfect 4-f configuration for both arms. The interference rings helped to visualize DM action when the camera was set to integrate for 150 nanoseconds, which allowed freezing the patterns being displayed by DM in time, as can be seen in Fig. 4(b-e) and in Visualization 1. Images were also acquired with the 100 times longer integration time-16 µs, that was ultimately used for retinal imaging during which the DM could display a couple of uncorrelated patterns. These patterns average down the image to a more uniform term, especially with larger defocus, as illustrated in Fig. 4(g-j). This loss of the fringe contrast with defocus is the essence of SCG. The DM had an active area of 3 mm in diameter in the middle of a membrane that could otherwise be treated as a flat mirror. The DM was positioned such that the beam hit both parts of it -the active and inactive, so that the camera could see both regions at the same time. This way we could see what DM ON and DM OFF does in one image. When mirrors in the sample and reference arms were in focus, rings appeared in regions conjugate to DM's active and inactive parts, as seen in Fig. 4(a). This is because the camera saw exactly the same beams regardless of the DM's area that they were reflected/diffracted from. Newton rings staying intact also demonstrate that DM does not destroy phase information coming from a sample plane that is in-focus. If, instead of putting DM in front the interferometer, it were placed in one of the arms, then the phase could not be recovered. However, the active part started distorting the fringes upon defocusing, as shown in Fig. 4(b), which can be explained by the differences in beams coming from the sample and reference arms. Specifically, the active part of DM modulates the beam in 3D -not only laterally but also axially -and so, when a beam from the active part of DM interferes with its own copy delayed in time, the interference is no longer round rings because the camera sees different cross-sections of the beam at any given time. With larger defocus fringe distortion increases, as shown in Fig. 4(e), which averages down to almost homogenous distribution, as shown in the corresponding image with longer integration time of 16 µs in Fig. 4(j). The inactive (flat) part of DM returns a beam that generally has a spherical wavefront impinging on the camera. The radius of the wavefront will change with defocus (as can be seen going from left to right in Fig. 4 images) but the interference will not be washed out.

Image acquisition and processing
For the retinal imaging, we acquired images with 512 × 512 pixels at the speed of 60 000 fps. We recorded 512 images while tuning the laser from 800 nm to 875 nm with the speed of 8700 nm/s. With those parameters, 116 volumes per second could be recorded, which corresponded to a single volume acquisition time of 8.6 ms. We normally acquired 30 volumes resulting in the total acquisition time of 258 ms, but not all volumes were used, as described below. The data were transferred to the computer for processing, which proceeded, as illustrated in Fig. 5 and Fig. 6: 1. At each pixel location (x, y) of the consecutive volumes, we subtracted the DC level from the spectral fringe pattern, which was estimated by bandpass filtering the input signal. Even though the optical frequency ω of the laser changed linearly with time, the spectral fringe pattern was resampled to ensure the linearity. We performed the resampling by linearizing the phase of the Hilbert transformed fringe pattern of the calibration signal. Then, resampled fringes were padded with 1536 zeros. Finally, Fourier transformation yielded the volumetric complex (amplitude and phase) representations of the sample (top-right in Fig. 5).
2. We corrected the complex data along xz and yz planes for chromatic dispersion mismatch and the possible retinal axial motion during the laser sweep [38]. To this end, we used the method described in Ref. [48], where the complex signal was Fourier-transformed, and then multiplied by the phase factor, e iϕ(ω) with adjustable quadratic (second-order) and cubic (third-order) elements: ϕ(ω) = a 2 (ω − ω 0 ) 2 + a 3 (ω − ω 0 ) 2 . The phase-corrected data were inverse Fourier-transformed. The process was continued by changing a 2 , a 3 coefficients until the corrected data (B-scans) became sharp (second row in Fig. 5). We relied on the visual assessment of image sharpness instead of the algorithm-based evaluation used in Ref. [48].
3. Each en face plane was spatially filtered to reduce the background noise. We calculated the 2D FFT of the complex data at the given depth. Then, applied the annular mask of radius R to the resulting spatial spectrum and calculated the inverse 2D FFT (2D IFFT, as shown in Fig. 5). Again, by visual inspection, we estimated the optimum value of R to be 50 pixels.
4. We repeated the above process for all the volumes in the dataset. Then, we registered the magnitude of these volumes to correct for the sample motion between volume acquisitions (top in Fig. 6). To find the subpixel 3D translation between the volumes, we used the regular step gradient optimizer, which adjusted the transformation between any two volumes to maximize the similarity metric, known as Mattes mutual information [49]. At each step, the optimizer followed the gradient of the metric towards the maximum. If it turned out that a volume could not be registered due to the large translation, it was rejected from the dataset. The resulting dataset was averaged to increase the SNR ratio.

5.
As the final step, we corrected the averaged volume for the illumination artifacts appearing due to the DM being out-of-focus. This was necessary because of imperfect defocus correction with the 3-lens system described in section 2.2. To correct for the illumination artifacts, we divided each of the en face planes by the magnitudes averaged along the z-direction as illustrated in the second row of Fig. 6.

Fig. 5.
A diagram illustrating the signal processing applied to consecutive volumes. For each pixel position (x, y), the acquired spectral fringe patterns (lines along ω) are corrected for the DC level, resampled, zero-padded, and Fourier transformed (first row). The resulting data is then phase-corrected to compensate for the chromatic dispersion and possible axial motion during the laser sweep (second row). Finally, the en face planes are spatially filtered using the annulus mask to suppress the background noise (third row).
The last step (number 5) should be redundant in the future once a better axial alignment procedure is followed, such as using an object for a volunteer to focus on while he/she adjusts the 3-lens system for defocus correction. On the other hand, it not only normalized pixel intensities across each en face plane but also removed a Gaussian intensity profile across the image that helped to better visualize features located at the edges of the volume.

Crosstalk rejection
To illustrate the efficiency of crosstalk removal with a simple example, we imaged lens tissue with DM OFF and ON. The image was distorted by the crosstalk noise, as shown in Fig. 7, when DM was OFF. Only strongly reflecting fibers were visible there due to the presence of speckles. Activating DM (ON) significantly suppressed the crosstalk noise, which resulted in the reduction of the speckle noise and contrast improvement in the image. Consequently, we could see otherwise invisible sample features that were buried behind the crosstalk noise when DM was OFF. We estimated the crosstalk noise reduction by calculating variances σ 2 n from two different square regions (yellow and red rectangles in Fig. 7). Before estimating σ 2 n , both images were normalized to their mean values. For the yellow region, we obtain σ 2 n, off = 6.33 × 10 −2 and σ 2 n, on = 4.94 × 10 −2 , for DM OFF and ON, respectively. For the red one, these values are: σ 2 n, off = 7.42 × 10 −2 and σ 2 n, on = 3.34 × 10 −2 . In both cases, the variance was reduced.

Retina imaging procedure
The volunteer's eye was imaged with the system, shown in Fig. 1, using only a standard chin rest to stabilize volunteer's head. The chin rest was mounted on an axial translation stage that allowed to adjust the position of the head axially. For imaging, the position was adjusted until the OCT signal of the retina was detected in the preview mode. The signal was then optimized by adjusting the 3-lens system described in section 2.2. The power sent on the eye was 4.8 mW, which was lower than that allowed by safety standards for the spatially extended illumination. Images were acquired with the DM ON and then DM OFF while trying to keep the same imaging site on the retina between the measurements for the comparison purposes. The imaging was conducted in accordance with the tenets of the Declaration of Helsinki. Written informed consent was obtained from all subjects prior to OCT imaging and after explanation of all possible consequences of the examination. The study protocol was approved by the ethics committee of the Collegium Medicum of Nicolaus Copernicus University, Bydgoszcz, Poland.

In vivo human retinal imaging
An eye of a healthy 44-years-old volunteer was imaged with the system, shown in Fig. 1, and with the procedure explained in section 2.6. Tropicamide was applied to the left eye to dilate the pupil to 6 mm. Figure 8 shows B-scans derived from 24 averaged volumes with (bottom-right) and without (bottom-middle) crosstalk removal. The layered retinal structure above the retinal pigment epithelium (RPE) was imaged with good quality in both modes -with DM ON and OFF, however, the RPE layer seemed slightly better when the crosstalk was removed. The retinal image acquired with the conventional FD-FF-OCT, shown top-left in Fig. 8, shows strong featureless signal below the RPE layer because of detection of multiply scattered photons that experienced increased propagation length in the retina, and therefore, these photons were assigned to the false axial locations, at the depths below the RPE. However, removal of the crosstalk by activating the DM allowed to minimize the crosstalk effect and enabled to see the choroidal morphology, which was otherwise hidden in the conventional FD-FF-OCT image. Figure 8 also shows scanning Fourier-Domain OCT (FD-OCT) image (top-right) of the same size and corresponding to the same retina site as FD-FF-OCT images that was extracted from a larger B-scan image, shown top-right in Fig. 8, which, in turn, was acquired with confocal scanning FD-OCT system described in Ref. [50].
The extracted (zoomed-in) FD-OCT B-scan image was captured within 10 ms, which was comparable to the acquisition time of the whole FD-FF-OCT volume -8.6 ms. Even though the FD-FF-OCT B-scan images were produced by averaging 24 of such volumes, which was overall longer time than necessary to record the FD-OCT B-scan, nevertheless, FD-FF-OCT was faster in terms of volumetric speed. It would have been challenging to have a fair comparison between averaged FD-OCT and FD-FF-OCT B-scans without applying the subpixel registration process to the volumetric data beforehand. The energy density delivered to the retina was comparable for both methods. In FD-FF-OCT an eye was illuminated with 4.8 mW onto the retinal area of 1 mm 2 within the total exposure time of 205 ms (for 24 averaged volumes), resulting in the energy density of around 1000 J/m 2 . The scanning FD-OCT system was operating at 25 kHz A-scan rate, and the beam with an optical power of 750 µW was focused to a spot of 10 × 10 µm 2 on the retina resulting in the energy density of 300 J/m 2 . Fluorescein angiography image, shown bottom-left in Fig. 8, indicates a region on the retina from where the images were taken from. Figure 9 shows en face FD-FF-OCT projections of different retinal layers that were derived from averaged 3D volumes. In contrast to in vivo retinal TD-FF-OCT imaging [11], our FD-FF-OCT system offers high-speed acquisition of the entire volume, enabling to process it numerically and display any chosen plane from the 3D volume (Visualization 2). Here, for example, the retinal layers were curved by varying degree but we were able to flatten them up individually. The en face images are grouped into the inner and outer retinal images in Fig. 9. The advantage of the crosstalk-free FD-FF-OCT operation becomes evident in the outer retina that shows more details, such as in the RPE layer and especially in choroid. Nevertheless, the inner retina images, like those of plexuses, also show higher contrast of, for instance, the capillary vessels in the crosstalk-free mode (when DM is ON). Even though the crosstalk-free image of IS/OS layer shows a significant contrast improvement we still cannot discern individual receptors, which could be explained by the optical aberrations blurring the image. The registration algorithm, explained in section 2.4, was working better on the crosstalk-free volumes since they had more distinct spatial details. Consequently, the motion blur was better eliminated. Figure 10 shows snapshots at three different angles of the averaged 3D volume that was created by rotating it with FluoRender software Visualization 3). It shows clear choroid morphology and differentiation between retinal layers. Figure 11 summarizes the comparative analysis of speckle size and contrast in images acquired with scanning confocal FD-OCT and FD-FF-OCT. It seems that the size of speckles is significantly larger in FD-OCT image [ Fig. 11(e)] than in FD-FF-OCT [ Fig. 11(f)].
There also seems to be differences in speckle size in FD-FF-OCT images acquired with and without crosstalk removal (DM ON and OFF, respectively). For a more quantitative evaluation of the speckle size, we calculated the autocovariance function of the B-scan images, I(x, z) that corresponds to the normalized autocorrelation function with a zero base: where . . . denote averaging. We assumed that the mean speckle size can be calculated from the FWHM values of the autocovariance curves: the horizontal I(x, 0) and vertical I(0, z) profiles corresponding to the  average transversal and axial speckle sizes [51]. The analysis of the speckle size reveals that the visual impression is slightly misleading since the size of transverse speckles is only three times larger in scanning FD-OCT than in FD-FF-OCT, as can be seen comparing the curves shown in Fig. 11(i) and (j). The differences between transversal speckle size of 9.5 µm for FD-OCT and 3.2 µm for FD-FF-OCT can be explained by the differences in the pupil size, which was 2 mm and 6 mm, respectively. The axial speckle size was found to be 1.4 times smaller for scanning FD-OCT compared to that of FD-FF-OCT, which can be explained by the differences in the used spectral bandwidth (110 nm and 78 nm, respectively), as well as confocal gating effect in the scanning FD-OCT. We found that the speckle size does not differ appreciably between images acquired with DM turned ON and OFF because filling of the pupil does not change significantly when DM is switched ON. The perceived smaller size of speckles in retinal Nerve Fiber Layer (NFL) is due to a significant background coming from the crosstalk contribution. Another important parameter is the reduction of the speckle contrast obtained by incoherent averaging of acquired volumes. In our previous work, we used angular compounding as implemented with a pair of galvo-scanners [22]. Here, we assumed that the natural eye motion would lead to a similar effect (Visualization 4). Figure 11(l) shows speckle contrast as a function of the number of averaged volumes. The speckle contrast curve (green crosses) was normalized to the value of the speckle contrast of a single volume [blue dots in Fig. 11(l)]. The maximum contrast after averaging 21 volumes (Visualization 5) was 0.63 corresponding to the averaging of only three fully decorrelated speckle patterns, as can be seen from the theoretical curve. Even though the contrast is improved by only 0.63 times the imaging quality was increased after averaging of 24 volumes due to the improved sensitivity, dynamic range and relatively small size of the transverse and axial speckles comparing to the morphological details of the retinal tissue. In order to further improve the quality of the cross-sectional images, we introduced a spatial compounding by averaging six consecutive B-Scans, shown in Fig. 12. The corresponding reduction of spatial resolution to 15 µm because of averaging matches that of the transverse resolution used in commercially available scanning FD-OCT devices. The corresponding fly-through movie is presented in Visualization 6.

Discussions
Fourier-domain FF-OCT can currently achieve A-scan rates of 40 MHz and a voxel rate of around 10 GHz [16,39,40], making it the fastest OCT technique. The speed can be traded-off for the sensitivity improvement through volume averaging. FD-FF-OCT inherently produces data volumes that can be used to, for example, generate en face views of otherwise curved retinal layers, which is more difficult to accomplish with the conventional TD-FF-OCT since it usually acquires a single en face image at a time. The advantage of FD-FF-OCT is not only the speed/sensitivity but also the ability to correct for chromatic dispersion numerically making the physical dispersion compensation unnecessary. However, despite its speed, the performance of FD-FF-OCT is inferior compared to that of the conventional scanning confocal OCT in terms of achieved imaging depth. The difference mainly stems from the full-field detection in FD-FF-OCT -use of a camera that cannot reject the out-of-focus light, which in confocal OCT is carried out by a pinhole. Absence of the confocal pinhole results in background light taking up a significant detection bandwidth of a camera. In addition, if the spatially coherent source, such as laser, is used it leads to crosstalk formation. A laser can also cause coherent autocorrelation noise, but it can be eliminated with the off-axis configuration [40], at the expense of spatial resolution or field-of-view. There have been attempts to deal with the out-of-focus contribution, such as using photorefractive materials [52], however, there has been no practical in vivo imaging demonstrations. Specular reflections that causes a significant background could be suppressed with the dark-field detection [53], which, if implemented in high-throughput design [54], can also efficiently increase SNR. We have previously demonstrated that the spatial coherence of the laser can be destroyed on µs scale, and thus, the crosstalk-free images of skin were acquired in vivo in less than a second [22]. Here we have extended this approach to retinal imaging, which required a more careful alignment system for eye imaging. Our method allowed to see the choroidal structure, which was otherwise buried in crosstalk noise. It improved the contrast in most of the images of the retinal layers, especially in the outer retina. Our images revealed more information in choroid compared to the retina images acquired with similar systems but employing spatially coherent swept sources [16,37,38,40]. The effect of the crosstalk removal was also clearly demonstrated here in Fig. 8, where switching the DM OFF resulted in hazed image of choroid layer due to the appearance of the crosstalk that largely concealed the morphological information. Figure 9 also shows a clear improvement in en face image contrast of choroid when crosstalk is removed. Our method is also much faster compared to TD-FF-OCT approach for retinal imaging [11], which is inherently crosstalk-free due to use of spatially incoherent light source. It is nearly impossible to acquired 3D volumes of retina with TD-FF-OCT because of the speed limitationsaxial image would be blurred-out in in vivo imaging situations because of sample movement and mechanical stepping necessary between the en face images in order to build the axial image. Therefore, only en face images has been recorded so far and no B-scans (axial images) reported because of the speed limitations in TD-FF-OCT [11]. In addition, no choroid images has been reported with TD-FF-OCT system. Although TD-FF-OCT is inherently less expensive technique, however, for retinal imaging in Ref. [11] the technique relied on use of SD-OCT add-on for the real-time optical pathlength matching.
Further improvements to FD-FF-OCT approach are possible through the system development and computational data processing. Foremost, computational aberration correction [16] will be implemented in the future, which, we expect, will improve our images beyond what has been already demonstrated in the literature thanks to the crosstalk-free nature of unprocessed (aberrated) images. We have lately generalized this approach as spatiotemporal optical coherence (STOC) manipulation [55], which describes a more controlled way to remove crosstalk. It employs predetermined phase patterns that are imprinted on the beam by means of the spatial light modulator. This approach should allow a complete crosstalk elimination without compromising the depth-of-field, which is essential in Fourier-domain OCT.

Summary
We have shown that crosstalk-free in vivo retinal imaging is possible in FD-FF-OCT system by means of spatial coherence destruction with a fast deformable membrane that projects random phase patterns at a megahertz rate. Crosstalk-free images of retina obtained with the FD-FF-OCT revealed the choroidal structure and demonstrated contrast improvement in most of the retinal layers. We expect further image enhancement with computational aberration correction and optical system improvement.