Extraction of ultra-high frequency retinal motions with a line scanning quasi-confocal ophthalmoscope

Ultra-high frequency motions of the retina severely affect the stabilization of images and result in significant imaging distortions. In this paper, a high speed line scanning quasi-confocal ophthalmoscope (LSO) capable of 160 frames per second was devised for generating stable undistorted retinal images. This technique resulted in minimal intra-frame motions, and a strip-based cross correlation algorithm with sub-pixel resolution was applied to extract retinal motions. Three retinal motion components at rates of up to 1600 Hz were clearly distinguished and extracted accurately for the first time using ophthalmic imaging methods. This was especially apparent for the fastest tremor and microsaccade movements that cannot be estimated from previously reported ophthalmic imaging instruments. Furthermore, these results were consistent with retinal motion characteristics obtained with optical lever methods, validating this technique. Actually, the LSO system has great potential to extract retinal motions, and some other tracking systems may be adopted to correct retinal motions in ophthalmic imaging modalities.


Introduction
The human retina carries out involuntary, extremely small and rapid movements, even when the eye fixates on a stationary object [1,2]. These movements typically produce gaze instability with an amplitude of several arc seconds to several arc minutes and a frequency of 10-100 Hz [3]. Moreover, these small movements introduce significant intra-frame and inter-frame distortions, and can warp images during eye fundus imaging examinations with high resolution systems such as optical coherence tomography (OCT) [4][5][6][7] and confocal scanning laser ophthalmoscope (CSLO) [8][9][10]. Therefore, image artifacts introduced by these movements severely impact accurate visualization of the retina, and hinder mathematical measurements of retinal microstructures. For example, retinal motions lower the quality of images, making it difficult to quantify measurements obtained by high resolution retinal imaging systems such as retinal thickness mapping (retinal topography), retinal cell density mapping and so on [11].
A number of methods have been proposed to minimize the impact of retinal movements. Riggs firstly developed a tight fitting contact lens placed directly on the sclera to successfully measure retinal movements optically [12]. Judge on the other hand used magnetic sclera search coils to record torsional motions [13]. These systems extracted and characterized different eye motions including high frequency tremors, rapid microsaccades, and slow drifts. However, these optical tracking methods were designed to measure global eye movements (by attachment of an apparatus to the sclera), which do not directly correlate with local retinal movements. For example, microsaccades movements produce shifts in the optical elements of the eye (called 'lens wobble'). The shift causes retinal image deflections, however, the optical tracking systems only measure global eye movements and therefore includes shifts of the eye lens and retinal distortions. For this reason, any changes in retinal movements will not be accurately extracted by measurements of global movements [14]. An optimal system for measuring retinal movements therefore must be based on ophthalmic imaging, so that any changes to the optics of the eye applied during retinal movements will not affect the imaging system. Consequently the analysis of the retinal imaging data will only include the local retinal movements measured during the imaging session [15]. To address these needs O'Connor applied a frame-to-frame cross-correlation to determine inter-frame retinal movements of CSLO images [16]. Roorda proposed a patch-based crosscorrelation method to estimate retinal movements on CSLO data, which required only a sequence of frames of scan data, and estimated intra-frame motions [17]. Recently, Paterson presented a multi-scale B-spline representation of one-deformational filed to map sequences of CSLO images in order to detect retinal movements [18].
However, the scan rate restricts the frame rate of these ophthalmic imaging methods to a maximum of approximately 20-40 Hz, which is comparable to the time scale of retinal movements [19]. Since horizontal scanning consumes tens of milliseconds, generally retinal movements are neglected since they comprise several lines of the image. Thus, a patch of several lines is used to calculate retinal movements in those methods [20]. Unfortunately, this general assumption is not always valid for a strip, since the frame rate has not been accelerated to overcome intra-frame retinal motions leading to a failure in the estimation. This is especially apparent in the case of ultra-high frequency tremors, or rapid microsaccades. Additionally, inter-frame motions are not estimated by obtaining images at a 30 Hz frame rate. Finally, this method also cannot distinguish between the three principal eye motions.
In this study, we present a high speed line scanning quasi-confocal ophthalmoscope (LSO) to image the retina instead of a CSLO. The LSO uses a line focus beam to illuminate the retina; while a linear array sensor (LAS) is used to image it [21]. Hence, the retinal scanning plane is performed in only one direction, improving the frame rate significantly.
Once retinal imaging with high speed frame rates are available, one can detect retinal motions from the motion-induced image distortions. When the frame rate of our system is increased, intra-frame distortions are eliminated entirely, and inter-frame distortions are also greatly reduced. Accordingly, we apply a strip-based cross correlation algorithm to detect ultra-high frequency motions. The cross correlation approach is a computationally efficient alternative to other motion estimation methods; allowing estimates of real-time motion that can be compared to high speed frame rates. In addition, it is not sensitive to image details, so image features-such as blood vessels-can be chosen to extract retinal motions.

Experimented configuration
The unfolded optical layout for the LSO is shown in figure 1(a). The illumination light was collimated through a convex lens (L 1 ) and spread in one dimension with a cylindrical lens (CL). The line beam was focused onto the retina with a scanning lens (L 2 , L 3 ), and scanned with a galvanometer-driven mirror (G). The CL and the galvanometer in figure 1(a) were used as a line beam scanner, which ultimately converted the point light source onto a focused line beam. The detailed ray-traces of the line scanner are shown in figures 1(b) and (c). The back-scattered light from the retina passed through the same lens and was descanned by the galvanometer through the beam splitter (BS) and focused onto the LAS with a detector objective lens (L 4 , L 5 ), meanwhile a slit aperture (SA) conjugated to the retina plane was placed close to the LAS to reject the majority of the back-scattered light from the adjacent voxels along the scanned line. Details of the LSO are described in [21].
A laser diode (LD) with a central wavelength of 780 nm and an internal objective collimator (50 mm focal length, 6 mm input beam diameter) was used as the light source. A galvanometer-driven scanning mirror (G, 6230H, Cambridge Corporation) was placed near the pupil conjugate and was used to scan the retina. A LAS (linear array CCD, AVIVA M2 CL 1014, ATMEL) was employed to record the intensity of the back-scattered light for each position of the line beam. Note that one pixel corresponds to 4.78 μm of planar distance across the retina. The whole length of the LAS is 1024 elements, but the curve shape of the retina produces some dark field on the margin, and the middle pixels of 900 elements are useful to collect light. The image acquisition speed was set according to the adjustable line acquisition rate of the line array CCD and the number of scan lines. The closed-loop galvanometer electronic board was driven by a function generator board (NI PCI-6221) with a saw-tooth signal to produce retinal images; meanwhile a gate-signal synchronous with the saw-tooth signal produced from the function generator was sent to a frame grabber (NI PCI-5122) to collect the line-CCD readout signals to form retinal images on a computer.
The field of view (FOV) for the LSO system was 9°on the retina, which allowed images of the macula, the optic nerve head (ONH), and other targets to be obtained faster and more efficiently, producing less distortion over a smaller (<15°) field. As the line acquisition rate of the line-CCD was 53 KHz/line, the image acquisition speed can reach up to 160 frames per second (fps) for 900 × 330 pixels, or 2700 × 960 μm on the retinal scale. Additionally, the retinal conjugate was magnified by 5.4 on the linear array (also the confocal SA). According to the optical design, an estimated optical resolution was ∼8 μm which was limited by aberrations of the human eye. However, a nearly diffraction-limited performance was still achieved across the 9°FOV with a 3 mm pupil size.

Preliminary human subject test plan
The LSO system was tested on human volunteers aged 22-34 with healthy eyes. In order to minimize the characteristics of retinal movements discussed above, the chosen subjects were typically highly practiced observers with extensive knowledge on controlling eye movement behavior. Therefore, it can be assumed that they exercised great efforts to maintain a steady gaze. In addition, a chin rest and forehead rest were used to stabilize head movements, and the subjects were asked to fixate on a bright green target with the fellow eye. Figure 2 shows a photograph of the system during retinal imaging.
Prior to examination, informed consent was obtained from all of the subjects. A typical session included video sequences >5 s in duration for measurement of retinal motions for one condition. Informed consent was also obtained for the power of the light beam. The incident power at the cornea for the extended imaging beam was 500 μw. This exposure level is considered safe according to ANSI standards (ANSI Z136.  for several hours of continuous intra-beam viewing. For our experiments, the line illumination would allow for even higher safe exposure levels, since the power was roughly 50 times below the ANSI maximum permissible exposure levels for the human eye. We noted in the consent that if the galvanometer scanner should fail, the subject could be exposed to the stationary laser line formed by a fixed CL. However, the LSO system can never focus to a point-instead of an extended line beam-for this reason the LSO is inherently much safer than other spot scanning instruments.

Motion retrieval algorithms
According to the Nyquist criterion, the frame rate was up to 160 fps, exceeding the frequency of retinal motions. Therefore, there were no intra-frame distortions. However, the image speed was not fast enough to extract the maximum frequency motions, such as tremors. In order to extract ultrahigh frequency retinal motions from the scanned images by the LSO, each frame of an LSO video was broken up into a set number of strips which were parallel to the line scanner. For each video, one frame was used as a fixed reference frame (usually taken to be the first frame unless otherwise noted), and a template of image features within each strip was then cross correlated to the reference frame. In order to increase the cross-correlation calculation rate and improve the algorithm precision, the template window was smaller than the size of the strip. The template window within each strip moved in the successive images to find the region that was most similar to the selected template. The similarity between images was determined by the cross-correlation algorithm, which is defined as: Where I and I′are the two serial images, and ′ I¯and I¯are the mean of the template and the moving windows, respectively. k and l denote the position of the moving windows. By finding the indices k and l which maximize the cross-correlation ′ I I corr( , ) k l , , the (k, l) displacements of the new frame with respect to the reference frame were determined to be a measure of relative motions of the retina within the specific strip area. Using a conic interpolation, the retina motions with sub-pixel precision in the x and y directions can be obtained as:

Results
The video clip in figure 3 shows a 1000 frame sequence (6.25 s) of LSO data in which retinal motions were extracted. All of the video sequences in this paper were raw images that were not treated by digital image processing techniques and were processed at their original acquisition speed. By default, the first frame of the movie was used as reference, but any other alternate was chosen when the first frame had no fixed reference, such as a blood vessel bifurcation. With this protocol, the subject was tested non-mydriatically (3 mm natural pupil), and the lateral resolution was found to be slightly worse than the CSLO. This is likely related to the intrinsic limitation of the line-scanning method. The lateral resolution was measured by obtaining the line spread function of the edge response in figure 3, and is shown in figure 4. The full width at half maximum of the line spread function was approximately 8.7 μm, slightly worse than the estimated resolution. This was determined to be related to differences in the optical manufacturer and installation. However, the lateral resolution was sufficient to resolve minimal tremors. Additionally sacrificing a small amount of resolution improved the frame rate, and therefore was justified.
For the 160 fps movie in figure 3, the number of strips for each frame was set to ten, and (k, l) displacements was reported 10 times per frame, for a speed-up rate of 1600 Hz. The cross correlations were computed within a template window of 20 × 20 pixels from ten overlapping strips per frame, each of which was 33 pixels high. Therefore, videos of 160 fps on time scales which exceeded the maximal  frequency of retinal motions eliminated intra-frame distortions. This allowed ultra-high frequency motions, like tremors, to be extracted by using the strip-based cross correlation at a rate of 1600 Hz.
In this video one observes that the predominant motion is a high frequency, low amplitude jitter, usually called a tremor, and the tremor movements occurred in some time produce somewhat erratic right-and-left, slow upward and downward drift motions. One can also observe another fast lurching large-amplitude retinal motion known as a microsaccade motion during the epochs between drifts.
To manifest motions of the retina clearly, the cross-correlation algorithm tracked positions in 1000 frames, which are shown in the video clips tracking_160 fps.avi. The plot is shown in figure 5. Note that the tracked position distance has been projected onto the planar distance of the retina. In this video of the tracked position plot, all three kinds of retinal motions can be extracted clearly, and are labeled in figure 4. However, there are many overlapped tracked positions in figure 5, which leads to an unclear identification of retinal movements. Therefore, the horizontal and vertical motions are extracted and shown in figure 5, and the vertical ordinate in subplot figures 6(a) and (b) has been converted to the retinal scale as well.
For quite some time now, a variety of techniques have been studied to record and describe human fixated eye movements, with the general consensus of reports agreeing that there are three main types of movements: tremor, drift and microsaccade [22]. Tremors are the smallest of all retinal motions with maximal amplitude of several microns, and a frequency of 30-100 Hz. Figure 5 shows the track of retinal movements in this video, with the tremor motion constituting an overwhelming majority of fixational retinal movements. By expanding the plots from the horizontal figures 6(c) and vertical (d) traces to show more details, the high frequency tremors can be distinguished more distinctly.
A drift is an irregular and relatively slow motion of the retina, and occurs simultaneously with tremor motions. In figure 5 high frequency tremors are superimposed on slow drifts; and in figures 6(a) and (b) drifts are represented as curved lines. One can clearly see the left side drifts (where many tremor movements accumulate to create a curve line shape) in figures 6(a), and in (b) the downward drifts are easily identified in the beginning of the plot.
Involuntary microsaccades usually arise from drifts where the retina image of fixation becomes too far removed from the center of the visional field. They carry the retinal image width across a range of several dozen to several hundred microns, and are 8-30 ms in duration. Microsaccade movements occur as large-amplitude, fast tremor movements between drifts. Consequently, one possible role of microsaccades is to correct displacements in the retinal position produced by drift motions. In figures 6(a) and (b) a noticeable microsaccade movement carry the retinal image upward within the frame.
Given the estimated motions from figures 5 and 6, we distinguished three kinds of retinal movements: tremor, drift and microsaccade, and the characteristics of retinal motions  are summarized in table 1. The tremor is the smallest of all retinal movements with a maximal amplitude of several microns, and is difficult to record accurately with an ophthalmic imaging system, where the tremor frequency is usually faster than the recording speed [23]. Other work focusing on fast motion ophthalmic imaging used a 32 stripbased cross correlation at a 30 Hz frame rate to extract and correct 960 Hz intra-frame motions [24]. However, this imaging method was not fast enough to measure ultra-high frequency tremors from intra-frame distortions, with the tremor movements estimated arising from several tremor motions' superposition. In our work, up to 1600 Hz motion estimation is achieved by obtaining images at a 160 fps frame rate and dividing the image into 10 strips. This allows high frequency motions due to tremors and microsaccades to be accurately measured. In table 1 the mean amplitude of tremors were determined to be about 5.3 μm with a range of 3.1 μm-14.6 μm.
Microsaccades are the most noticeable movements which twitch across a range of several dozen to several hundred microns widths, with a duration of several dozen milliseconds, making the region-of-interest jump to another corner of the frame completely [23]. In this video, one can see five macrosaccade movements, which occurred in the middle of the recording shown in the tracked positions plotted in figure 6. In table 1 we listed the times of each of these five microsaccade movements where the mean amplitude was approximately 47.9 μm and the duration ranged from 6 ms to 25 ms, in accordance to the data measured from optical lever methods [23,25]. Amplitude parameters measured in this work ranged from 30.3 μm to 82 μm, a considerably smaller range than the ranges recorded in previous studies [26,27]. Actually, microsaccade movements have been studied extensively using global eye tracking methods, however there are many discrepancies and disagreements over interpretation of results from different laboratories. For the ophthalmic imaging method, the speed-up strip-based cross correlation easily measures microsaccade motions; however a frame rate of 30 Hz is too slow to record this kind of motions. Hence microsaccade motions were usually mixed with tremors, making it difficult to distinguish between the two movements. Here with a frame rate of 160 Hz, the LSO system can record and distinguish between different kinds of retinal motions.
Drifts occur simultaneously with tremors and are slow motions accumulated from many tremors during the epochs between microsaccades [28]. Drifts are random motions of many tremors attempting to maintain visual fixation in the absence of microsaccades, or at times when compensation by microsaccades is relatively poor. Therefore, drifts are the most distinguishable motions, and both 20-40 fps ophthalmic imaging systems and our high speed LSO system have the ability to record them. However, what's different is that 20-40 fps systems cannot pick out drifts once there are no significant microsaccade motions. Specifically, drift motions always arrive between microsaccade motions. Since 20-40 fps systems cannot distinguish microsaccade motions clearly, definitions for distinguishing drifts are unclear. Although drifts may be extracted from 20 to 40 fps movies, they can be hard to determine, particularly, when a microsaccade motion with a small amplitude occurs. This results in two drifts becoming mistakenly linked and considered a single drift, since the 20-40 fps system is unable to extract the small amplitude microsaccade motion. In figure 6 we can distinguish several drift movements in duration of 62.5 ms-1 s, with mean amplitude of approximately 57.9 μm. In response to the occurrence of drift movements, every drift movement continues until a microsaccade movement appears, and then amplitudes for drifts range from 17.6 μm to 103.8 μm.
Existing ophthalmic imaging systems with a frame rate of 20-40 Hz are able to extract intra-frame motions through a strip-based cross correlation or other methods. However, inter-frame motions are difficult to distinguish clearly owing to the low frame frequency. As a consequence, those existing systems are unable to extract the three principal kinds of retinal motions. In this paper, a LSO system of 160 fps is used to extract inter-frame motions, and a strip-based cross correlation is used to measure intra-frame motions. Therefore, the three kinds of retinal motions are successfully extracted for the first time through an ophthalmic imaging system. Ultrahigh frequency motions are summarized in table 1, and the results show distinct differences for the three retinal motion components, with significant characteristics presented clearly.
With a high speed LSO, we applied the strip-based cross correlation algorithm with sub-pixel resolution to extract retinal motions. Additionally, retinal motions on the time scale of 1600 Hz were accurately estimated. The cross-correlation coefficient presented in equation (1) defined the precision of retinal motions detected, and is shown in figure 7. The value of the cross-correlation coefficient equal to 1 indicates an ideal match for the sequence images, and the actual value in figure 6 is 0.9943 ± 6.1457e-004(mean ± 2 standard error of the mean (SEM)). This suggests that our motion extraction scheme is very accurate. With all of the cross-correlated retinal motions, successive frames of the video in figure 3 were registered to produce an average image, illustrated in figure 8. All of the one thousand frames were selected to successfully co-add, and there is almost no image blur caused by retinal motions. While in figure 3, it is almost impossible to identify small vessels and details close to the ONH; in figure 8, the dynamic range is much better and small structures are clearly represented (especially in the left and right portions of the image). To examine the precision of retinal motions extracted by using the strip-based cross correlation algorithm, the averaged image is cross-correlated with the first frame of this video. The value of cross-correlation coefficient is 0.9930. Given that all of the frames are aligned to remove distortions, the value of cross-correlation coefficient is decreased slightly compared to a single frame cross-correlation. However, it is still close to the ideal value, and indicates that retinal motion extraction by strip-based crosscorrelation is effective and accurate.

Discussion and conclusions
The human retina continually moves even when the observed object appears to be fixed in the FOV. Some of the earliest recordings of eye movements during fixation were made with the optical lever-contact lens technique. Recordings with this technique identified three principal types of eye movements: (1) tremors, occurring at frequencies of 30-100 Hz with a corresponding amplitude of several dozen sec of arc; (2) microsaccades, occurring every dozens of milliseconds, with a mean amplitude of dozens min of arc; (3) drifts at a rate of a few seconds, characteristically slow with large amplitudes [29]. However, measurements based on the optical lever approach are based on global eye movements, and do not perfectly correlate with local retinal movements, even though the major inclusions are retinal motions. For example, the rotation of the eye axis produces shifts in the optical elements of the eye (cornea, crystalline lens), measured with local retinal motions that are not accurately abstracted by measurements of global eye motions.
As techniques have improved, eye trackers have been designed and applied to solve rotational motions of the anterior eye [30], and residual eye movements including local retinal motions. These retinal motions prevent the visual scene from fading and enhance vision quality, but conversely lead to severe distortions, or warping of ophthalmic images. This severely affects the fidelity visualization of the retina, and is harmful to mathematically measure retinal microstructure for clinical research and patient care in ophthalmology [11]. For example, the backscatter ratio of the retina is very weak, and a much longer exposure time is need to improve the signal-to-noise of static images. Since those retinal motions are much faster than the imaging speed large amplitude motions can lead to severe distortions, and warping of retinal images. This is especially apparent in fundus fluorescence and auto-fluorescence, where these motions can bring about tailing hangovers of fluorescent substance in the retina, intensifying the blur of images and resulting in a failure of mathematical measurements.
It is critical to account for small retinal movements. Therefore, the optimal measurement of retinal movements must be based on ophthalmic imaging, where the analysis of the retinal image is just the local retinal movements during the imaging session. Most ophthalmic imaging methods estimate retinal motions based on correlation within successive frames. A reference frame is chosen as a template, and successive frames in the video are computed to detect relative motions. However, the retina moves during the reference frame recording period with a frame frequency of 20-40 Hz. Therefore, only motions on a time scale of the video speed are estimated, therefore the estimated motions are drifts, and the ultra-high frequency retinal motions such as tremors and microsaccades are not extracted and resolved accurately. Some improved methods, such as a patch-based cross-correlation algorithm [24], have been attempted to overcome the shortage of imaging speed required to extract retinal movements. However, these techniques extract the intra-frame motions of the reference frame, and thus compensate the reference frame bias. Additionally, the three types of retinal movements are not distinguished; only drift movements of low frequencies are estimated.  Therefore, the most efficient approach to estimate ultrahigh frequency retinal movements is to accelerate imaging speed. The reference frame must be recorded at time scales equivalent to the time scale of tremor movements to ensure no intra-frame motions occurring in the reference frame. Consequently, the estimated motions referred to as the reference frame are true motions, and ultra-high frequency movements like tremors and microsaccades can be resolved and extracted precisely.
In this article, the retinal images of 900 × 330 pixels were obtained at a frame rate of 160 Hz by using a high speed LSO system, and applied to a strip-based cross correlation algorithm presented in section 2.3. Consequently (1) video acquisition was acquired adjacent to the fastest tremor movements' frequency eliminating intra-frame motions and bias in the reference frame; (2) compared to retinal motions with frequencies of 10-100 Hz, various retinal motion components of 1600 Hz were estimated by obtaining images at a frame rate of 160 fps and dividing the image into 10 strips.
Compared to optical lever-contact lens techniques, the LSO system for measuring retinal motions sacrificed speed performance, whereas the temporal resolution of the imaging speed was high enough to extract several retinal motions. An additional advantage is the improvement in optical resolution, which is more accurate for abstracting local retinal motions. Comparing with ophthalmic imaging methods, the LSO system's optical resolution is inferior to SLO's, especially for ophthalmic instruments integrated with adaptive optics (e.g. AO-CSLO, AO-OCT). However, the LSO system has the optical resolution to resolve retinal motions, even tremors of small amplitudes. The imaging speed of the LSO is higher than other ophthalmic imaging instruments', and the LSO system has sufficient temporal resolution to measure and describe retinal motions, required for extracting ultra-high frequency motion components.
The results of this study validate this technique for measuring tremors, drifts and microsaccades of retinal motions, and are consistent with optical lever methods. Given that no practical retinal imaging instrument in existence can measure tremor, drift and microsaccade movements, the goal of this research was to work toward precise extraction of these retinal motion components, so that (1) the optimal local motions for retina are detected based on ophthalmic imaging, and several motion components can be resolved clearly; (2) given the precise motion extraction, retinal image distortions can potentially be corrected completely, and provide high fidelity visualization of the retina, either as stabilized videos; (3) with this high speed LSO system (at a frame rate of 160 Hz) measuring retinal motions, an estimated closed-loop bandwidth of 8 Hz can be achieved using other tracking units to correct retinal motions. Considering that the fluctuant frequency of the human eye aberration is 2 Hz, adaptive optics instruments have been successfully applied to correct eye aberrations in fundus imaging devices. Consequently, the high speed LSO system with a tracking apparatus and integration of adaptive optical instruments into various ophthalmic imaging modalities will obtain retinal images without eye aberration and retinal motion blur.