Closed-loop optical stabilization and digital image registration in adaptive optics scanning light ophthalmoscopy

: Eye motion is a major impediment to the efficient acquisition of high resolution retinal images with the adaptive optics (AO) scanning light ophthalmoscope (AOSLO). Here we demonstrate a solution to this problem by implementing both optical stabilization and digital image registration in an AOSLO. We replaced the slow scanning mirror with a two-axis tip/tilt mirror for the dual functions of slow scanning and optical stabilization. Closed-loop optical stabilization reduced the amplitude of eye-movement related-image motion by a factor of 10–15. The residual RMS error after optical stabilization alone was on the order of the size of foveal cones: ~1.66–2.56 μ m or ~0.34–0.53 arcmin with typical fixational eye motion for normal observers. The full implementation, with real-time digital image registration, corrected the residual eye motion after optical stabilization with an accuracy of ~0.20–0.25 μ m or ~0.04–0.05 arcmin RMS, which to our knowledge is more accurate than any method previously reported.


Introduction
The adaptive optics scanning light ophthalmoscope (AOSLO) has become an important tool for the study of the human retina in both normal and diseased eyes [1][2][3][4].The human eye is constantly in motion; even during careful fixation, normal, involuntary, microscopic eye movements [5,6], cause the scanned field of the AOSLO to move continuously across the retina in a pattern that corresponds exactly to the eye motion.Fixational eye motion causes unique distortions in each AOSLO frame due to the fact that each image is acquired over time.In the normal eye, these movements tend to be rather small in amplitude.However, in patients with retinal disease or poor vision, fixational eye movements can be amplified [7] and introduce distortions that are a major hindrance to efficient AOSLO imaging, in some cases precluding imaging altogether.Unfortunately, these patients are potentially some of the most interesting to study using this technology.It is therefore desirable, particularly for clinical imaging, to minimize or eliminate this motion altogether.
The velocity of the fast scanning mirror is the primary limitation to achieving very high frame rates that would effectively eliminate image distortion within individual frames [8,9].The frame rate of current research AOSLOs is limited by the speed of appropriately-sized, commercially-available fast scanning mirrors, which achieve a maximum frequency of ~16 kHz.All clinical and most experimental uses of these instruments require that the AOSLO image be rectified (i.e.'desinusoided') to remove the sinusoidal distortion caused by the resonant scanner, registered (to facilitate averaging) and averaged (to increase SNR) to generate an image for qualitative or quantitative analysis.Image registration works by recovering the eye motion and nullifying it, and is required to generate high SNR images from AO image sequences.Several methods for recovering eye motion from scanned imaging systems have been described in various reports [8][9][10][11][12][13][14][15].
Photographic techniques were first used to accurately measure eye movements over 100 years ago [16].Over the last century, many different methods have been developed to accurately measure eye motion, including scleral search coils and many different types of external eye imaging systems [5]; many of these methods have been coupled with stimulus delivery systems to stabilize or manipulate the motion of the retinal image.One of the earliest attempts to precisely measure the two dimensional motion of the retinal image [17] employed the 'optical-lever' method to indirectly measure it by measuring the motion of the globe; this technique measures the light reflected from a plane mirror attached to the eye with a tightly fitting contact lens [18].The optical lever method was used to both measure eye movements and deliver stabilized stimuli to the retina; this method has achieved very precise optical stabilization, with an error of 0.2-0.38 arcmin, or less than the diameter of a foveal cone (~0.5 arcmin) [19,20].Despite its precision, the invasive nature of this method and its limitations for stimulus delivery caused it to be largely abandoned after dual-Purkinje image (DPI) eye trackers were developed [21][22][23].As the name suggests, DPI eye trackers use the Purkinje images (ie.images of a light source reflected from the cornea and lens) to non-invasively measure eye position.This is another example of indirect measurement of retinal image motion as it uses a surrogate (in this case the motion of the Purkinje images) to the motion of the retina.Modern DPI eye trackers can measure eye motion and manipulate visual stimuli with a precision of ~1 arcmin [23].Despite their precision, each of these methods can only indirectly infer the motion of the retinal image; here we report on recent advances to measure and stabilize eye motion by directly tracking the motion of the retina itself.
Ott and colleagues showed that a scanning laser ophthalmoscope (SLO) could be used to calculate eye motion by measuring the motion of the retinal image [10][11][12]24].The first full implementation of offline recovery of eye movements from an AOSLO for the purpose of image registration, was described by Stevenson and Roorda [9].This report also compared the AOSLO as an eye tracker to measurements simultaneously obtained from a DPI.The method was expanded and improved by Vogel and colleagues [8] and evolved into online digital stabilization for the purpose of stabilized stimulus delivery [25], and was used to guide the placement of a stimulus onto targeted retinal locations, ultimately achieving the capability to stimulate single cones in vivo in the normal human eye [26][27][28].To improve the flexibility of stabilized stimulus delivery, the electronics for imaging and light source modulation were migrated to field-programmable gate array (FPGA) technology, enabling high-speed real-time eye tracking [29].FPGA technology facilitated the implementation of open-loop SLO optical tracking [30] and achieved successful application to OCT [31,32].
During the same period that these advancements were occurring in the Roorda laboratory, the Burns laboratory and Physical Sciences Inc. developed a closed-loop optical eye tracking system for an AOSLO with an integrated wide field of view (FOV) line scan imaging system [14,33,34].This device used a tracking beam reflectometer to measure displacements of the optic disc in real-time and provide stabilization signals to tip and tilt mirrors.However, this system required significant tuning of parameters and settings for each eye to achieve stable tracking and robust re-locking and an additional imaging subsystem [33].This device achieved 10-15 μm tracking accuracy for the wide FOV scanning system [33], but residual AOSLO motion was not compensated for with real-time image registration.
To combine and improve upon the achievements of our colleagues at UC Berkeley, IU, and PSI, we have implemented a robust closed-loop optical stabilization system with digital registration in one of our AOSLO systems and report its performance here in normal eyes.

Optical Stabilization
To implement optical stabilization in one of our existing AOSLO systems (described in detail elsewhere [35]) we modified the system to utilize a 2-axis tip/tilt mirror (S-334.2SL,Physik Instrumente, Karlsruhe, Germany) for both slow scanning and optical stabilization.The tip/tilt mirror (TTM) replaced the slow galvanometric scanner (labeled as vertical scanner in the optical diagram of the system shown in Fig. 5 of [35]).The TTM provides the capability to steer the beam at the pupil plane.In our optical system, each mirror axis has ± 3 degrees of independent optical deflection.When the mirror is mounted at 0° or 90°, the two axes 1 r  and 2 r  are aligned at + 45° and -45°, as shown in Fig. 1.For convenience, we define, where 1 r  and 2 r  are motions in individual directions of the mirror, r  is the combined motion, and (a, b) are amplitudes in the two axes.In our optical system, the slow scanner scans the retina in the vertical direction and the fast scanner scans in the horizontal direction.The slow scanner is driven with a periodic ramp signal.The vertical scan is generated on the TTM by applying the inverse ramp signal to each axis; this produces motion in the vertical direction.Each signal is reduced by a factor of 2 / 2 to generate the desired amplitude.The frequency of this signal, which sets the frame rate of the system, is ~22 Hz; ideally, the retrace would be instantaneous, however, in practice this is not possible and the retrace time is ~2.3 ms.The combination of 1 r  and 2 r  gives the full range of the TTM(illustrated by the shaded diamond in Fig. 1); the size of the 1.5° × 1.5° AOSLO imaging field is shown as the dark square.Theoretically, as long as the imaging field does not move outside of this range, the mirror can stabilize the motion by dynamically updating the position of the imaging beam.However, due to constraints from implementing this within our existing optical system, the AOSLO beam is vignetted by mechanical parts when the tracking mirror steers the AOSLO imaging field out of the area enclosed by the dashed rectangle in Fig. 1.Therefore, it can be seen that the full stabilization range of the TTM cannot be utilized simply by replacing the slow scanner in the existing optical system.A redesign of the optical system is necessary to take advantage of the full range of the TTM.The tracking algorithm demonstrated here is image-based and relies on matching subsequent data to an acquired 'reference frame'; the user manually chooses a reference frame and the algorithm registers subsequent 'target frames' to the reference frame.This algorithm has been briefly mentioned in previous reports [29,30,32] and is described in detail here, in the appendix.A flow chart of the stabilization system is illustrated in Fig. 2, and consists of the following procedures: 1. Light reflected/scattered from the retina is converted to an analog voltage by the light detection system (in this case a photomultiplier tube (PMT)) 2. Images are digitized with an analog-to-digital converter (A/D)  A FPGA device (ML506, Xilinx Inc., San Jose, CA) is employed to control data acquisition [29] and to program the DAC for driving the TTM.The A/D chip is integrated on the FPGA for image acquisition, and the D/A is performed with a dedicated 125MSPS 14-bit DAC (DAC2904EVM, Texas Instruments Inc., Dallas, TX).The tracking algorithm runs on a consumer level GPU (GTX560, NVIDIA Corporation, Santa Clara, CA).

Strip-level data acquisition and eye motion tracking
In any real-time control system, it is important to reduce system latencies; these can be electronic or mechanical.The mechanical latencies are usually fixed and determined by the specifications of the components acquired from the manufacturers (e.g. the mechanical response time of the TTM).Here we focus on the latencies that we can control: the electronic latencies, and present solutions to reduce them.
Traditionally, in a video rate imaging system, data is sent to the host PC frame by frame.Due to its scanned acquisition method and small field of view distortions are introduced into each AOSLO frame because the eye often moves faster than the frame rate; these distortions are amplified as the FOV becomes smaller.With an image-based algorithm such as crosscorrelation, these 'within-frame' distortions, which encode the motion of the eye during the frame, can be recovered by dividing a whole frame into multiple strips (consisting of several scan lines) where the motion of each individual strip is calculated separately.A single line is acquired extremely rapidly with respect to fixational eye motion and can often be considered to be effectively undistorted (aside from the sinusoidal distortion induced by the fast scanner).The amount of distortion encoded into each strip of data is directly proportional to the velocity of the eye during strip acquisition.Sheehy and colleagues [30] have reported on some aspects of the stabilization algorithm performance, however, the relationship between eye velocity and algorithm accuracy needs further study.Images can be registered by calculating and applying the motion from individual strips.The motion calculated from each strip is sent to the TTM to steer the imaging beam back to the location of that strip on the reference frame.Ideally, real-time images from the eye will not shift and will be 'frozen' completely when stabilization is engaged, however, this would require zero latencies and a perfect motion tracking algorithm.In reality, some residual eye motion will still be seen after optical stabilization is activated due to tracking algorithm errors and the mechanical and electronic latencies.As previously stated, the mechanical latencies are out of our control; our goal here was to reduce the electronic latencies as much as possible.The source and duration of each of the electronic latencies are listed in Table 1.We applied a 2-D smoothing filter, either 3x3 (kernel ) in preprocessing (T 3 ) to make the cross-correlation algorithm more robust to noise.In some very low contrast images, a 2-D Sobel filter (kernel 1 ) was applied after the smoothing filter, in the form of ⏐S x k⏐ + ⏐S y k⏐, where k is the image strip.The edge artifacts from the convolution are set to 0 after filtering.We then threshold the filtered images and set to 0 those pixels whose gray level is less than 25% of the maximum gray level in the filtered image.In this way, we produce a sparse matrix for use with the tracking algorithm.T 4 and T 5 are the latencies associated with the main components of the tracking algorithm and are described in detail in the appendix.Of the six electronic latencies listed in Table 1, T 2 and T 6 are on the order of tens of microseconds each; these are difficult to reduce further and negligible compared to the other four.
To reduce the data acquisition latency (T 1 ), we implemented strip-level data acquisition.Each frame was divided into multiple strips and the FPGA sent each strip to the host PC as soon as the analog signal was digitized.The host PC then activated the tracking algorithm on the GPU immediately after the new strip was received.If the images were sent to the host PC frame by frame, the averaged T 1 would be at least half the time required to acquire a frame (i.e.~23 ms for a 22 fps system).This case is equivalent to frame-level stabilization since the TTM is not updated until the end of a frame where motion of all strips are calculated all together.The best scenario is to use motion of the bottom strip of the current frame to drive the TTM to stabilize image of the next frame.This bottom strip has one strip sampling latency (e.g, t) to the first strip of the next frame, and one whole frame (e.g., T) plus one strip latency (t) to the last strip of the next frame.The averaged result for T 1 will be T/2 + t.Apparently, a simple frame-level stabilization has a whole frame of sampling latency.After adding T 2 , T 3 , T 4 , T 5 , and T 6 for all strips of a single frame, the total electronic latency would be tens of microseconds, far too long to realize real-time stabilization.Ideally, buffering lineby-line, or even pixel-by-pixel, could further reduce the latencies from data acquisition, but FFT based cross-correlation needs significantly more data than a pixel or a line for a robust result.Due to high noise and low contrast images, particularly in diseased eyes, we found in our particular system that tracking efficiency, defined as the ratio of successfully stabilized strips to the total number of strips after tracking, was ~50% greater for a 32-line strip than for a 16-line strip.This number will, of course, likely vary depending on the imaging system.
The transfer of image data from the device to the host PC is implemented by Bus-Master DMA technology.Strip-level data acquisition and buffering are balanced by two factors: 1) the capability of the host PC to handle hardware interrupts and 2) the minimum amount of data required for robust cross-correlation.Our benchmarking shows that at the rate of 1000 interrupts/second, the PC interrupt handler uses only ~3-5% of its CPU resources (e.g. on Intel i3, i5, and i7 CPUs); at the rate of 10,000 interrupts/second, it uses ~50-70% of its CPU resource, which causes serious reliability issues for smooth scheduling of the other PC threads, such as preprocessing and eye motion detection.In our system, the data acquisition strip height was set to 16   Figure 3 illustrates the timing of strip-level data acquisition and eye motion detection with the six electronic latencies.In Fig. 3, each frame is divided into multiple strips, with strip indices k, k + 1, k + 2, k + 3, … and N lines per strip.To calculate eye motion at location L (red solid circle in Fig. 3), the algorithm uses 2N strips, one from the existing data (k) and the just acquired strip (k + 1).Therefore, the algorithm obtains the data after strip k + 1 is completely received.In our case, the time required to collect strip k + 1 with N = 16 lines is ~1.2 ms, therefore T 1 = 1.2 ms.After strip k + 1 is acquired, data is buffered (T 2 ), and the algorithm proceeds with preprocessing (T 3 ), large amplitude motion & blink detection (T 4 ), small amplitude eye motion calculation (T 5 ) and mirror motion encoding (T 6 ).The computations for T 3 , T 4 , and T 5 are offloaded to the GPU; each step takes ~0.2-0.25 ms.It should be noted that all of these computations can conveniently be migrated to the CPU when future processors become powerful enough.Taking into account the event-driven operating system (Microsoft Windows 7, Microsoft Corporation, Redmond, WA), the total computational and buffering latency is T c = T 2 + T 3 + T 4 + T 5 + T 6 (~0.7-0.8 ms).From Fig. 3, it can be seen that to run in real-time, T c must be less than T 1 , as all computation must be completed before the algorithm receives the next strip of data(i.e.strip k + 2), which is required to calculate the eye motion one strip below location L (yellow circle M in Fig. 3).With the current 16 lines per strip, the total electronic latency (T 1 + T 2 + T 3 + T 4 + T 5 + T 6 ) is ~1.9-2.0 ms.Thus the TTM will receive commands ~1.9-2.0 ms after the eye moves.The mechanical latency of the TTM (ie. the time required to reach the desired position after the drive voltage has been updated) is variable and depends upon the drive voltage increment, which is dependent upon the motion of the eye.Typically, during eye drift, the voltage increment is on the order of 15-20 mV and the TTM has a mechanical latency of ~2 ms.Therefore, the mirror will steer the beam back to its original (reference) location ~4 ms after the eye moves.We chose this relatively slow TTM because of its high mechanical stability.

Closed-loop optical stabilization
Optical stabilization is implemented using closed-loop control with, where t and t + 1 denote the time sequence, ( , ) x y g g g =  are control gains in the horizontal and vertical directions, ( ) ( , ) is the residual image motion in each direction, calculated from the tracking algorithm, ( ) R t  is the current position of the stabilization mirror, and ( 1) R t +  is the new position of the stabilization mirror.Units of ( 1), ( ), ( ) are the same, in pixels or arcseconds.g  is dimensionless.As mentioned previously, the two axes of the stabilization mirror point to 45° and 135°, as shown in Fig. 1, thus ( 1) R t +  needs to be rotated 45° before they are applied.As previously stated, the stabilization mirror is used simultaneously for slow scanning, so the net signals applied on the two axes of the stabilization mirror are, where Θ is the operator of 45° rotation, and S  is the slow scanning ramp signals.Due to the relatively slow mechanical response of the stabilization mirror (~2 ms) and the fast eye motion update rate from the tracking algorithm (~1.2 ms), we set the gain ( , ) x y g g g =  to ~0.1-0.15 to achieve stability.This low gain also helps the system to smoothly re-lock eye motion after large amplitude motion or a blink (see Appendix), which usually occurs every few seconds.

Digital image registration
Theoretically, once optical stabilization is activated and the mirror dynamically compensates for the eye motion, it can be seen from Eq. ( 4), that the tracking algorithm calculates residual image motion only.Residual image motion is significantly smaller than the raw motion before optical stabilization.Digital image registration uses this small residual motion signal to create a registered image sequence.The computational accuracy of the digital registration is ± 0.5 pixels.Digital registration is executed during the period when the retrace period of the slow scanner (i.e., when it is moving backward after a full frame has been acquired but before the next frame begins).No additional cross correlation is required for digital registration because the motion of any strip from the current frame is calculated before it is used to drive the stabilization mirror.The motions from all strips are used directly for digital registration at the end of this frame.

Evaluation of system performance
To evaluate system performance, we tested both optical stabilization alone and optical stabilization combined with digital image registration on a model eye and several normal human eyes.The model eye consisted of an achromatic lens and a piece of paper.Motion was induced in the model eye image by removing the final flat mirror in the AOSLO system and placing a galvanometric scanner at a point conjugate with the exit pupil of the optical system before directing it into the model eye.The model eye was used to test the performance with sinusoidal motion at 1 and 2 Hz in the direction of the fast scan, with amplitudes of 0.21° and 0.17° and peak velocities of ~79 arcmin/sec and ~128 arcmin/sec, respectively.These peak velocities are faster than those reported in the literature for all fixational eye motion except for microsaccades [5].To test system performance in the living eye, 2-3 image sequences (20 or 30-seconds in duration) were acquired at each of several locations in the central macula.
The FOV was ~1.5° × 1.5°(~434 × 434 µm, on average); images were 576 × 576 pixels.Each image sequence was acquired in three stages: 1) no tracking (ie.normal eye motion), 2) optical stabilization only, and 3) both optical stabilization and digital registration.The frame when each epoch began was recorded digitally for later analysis.Reference frames were selected manually.For three participants (NOR011a, NOR025a, NOR037a), 15 retinal locations were imaged (shown in Fig. 4(a)).NOR047a was imaged at 21 retinal locations (shown in Fig. 4(b)), while NOR046a was imaged at 15 random locations within the central macula.For all participants but NOR046a, retinal locations were targeted using fundus guided fixation target control software [36].This software (written in MATLAB (MathWorks, Natick, MA) using elements of the Psychophysics toolbox extensions [37][38][39]), controlled the position of a fixation target and mapped target displacements to the estimated location of the AOSLO imaging field on the retina.The estimated imaging location was displayed in the software GUI as a square overlaid on a wide field fundus image of the participants eye, which was acquired from a separate imaging system prior to AOSLO imaging.The fixation target was a small white circle (~30 arcmin in diameter)displayed on an LCD monitor and viewed off of a laser window placed in front of the eye.The laser window transmitted the AOSLO light but reflected a portion of the fixation target light into the eye.The fixation target for NOR046a was an array of blue LEDs.Fig. 4. Human imaging locations.Subjects NOR011a, NOR025a, NOR037a, were imaged using the pattern of locations shown in (a), while subject NOR047a was imaged with the pattern shown in (b).The gray circle denotes the foveal center imaging location, while the gray squares denote eccentric imaging locations.
To evaluate system performance, we calculated the RMS separately for each condition.RMS was calculated using Eq. ( 6), ( ) where N F S = * , F is the number of frames, S is the number of strips in a single frame, r i are the locations of individual strips, and r is the mean location of the strips.The RMS values we report here were calculated for all frames successfully tracked with the small amplitude motion component of the algorithm (see Appendix), thus they exclude all motion measurements greater than ½ the strip height, or 16 pixels.

Participants
Five participants with normal vision were recruited from the faculty and staff of the University of Rochester and the local community.Two of the authors participated; they were experienced observers and were aware of the purpose of the experiment (EAR & KN; NOR046a & NOR047a, respectively).The other three subjects were naïve as to the purposes of the experiments but each had participated in AOSLO imaging previously.Participants ranged in age from 25 to 65 years.All participants gave written informed consent after the nature of the experiments and any possible risks were explained both verbally and in writing.All experiments were approved by the Research Subjects Review Board of the University of Rochester and adhered to the Tenets of the Declaration of Helsinki.

Results
Media 1 illustrates the performance of optical stabilization for each condition of model eye motion.Residual motion after optical stabilization was ~1/12 of the original motion at 1 Hz and ~1/7 at 2 Hz, thus ~92% and ~85% of the motion was compensated for with optical stabilization in each of the model eye conditions, respectively.Media 2 shows an image sequence from a human eye.The first 4 seconds demonstrate typical normal human fixational eye motion in AOSLO.After 4 seconds elapsed, optical stabilization was activated.As can be seen from the image sequence, there was still a small amount of residual motion.Shortly after 10 seconds have elapsed, digital image registration is activated to eliminate residual motion; the remaining frames are nearly 'frozen' completely.It should be noted that after optical stabilization was activated, the two non-naïve participants (NOR046a & NOR047a) reported that in many of the trials the imaging field appeared to fade away.This is the well-known phenomenon of Troxler fading [40], which occurs when an image is stabilized on the retina [41].For these experienced observers, microsaccades often occurred shortly after they noticed the Troxler fading, as they made reflexive movements to reestablish the raster image.Troxler fading was usually noticed several seconds after optical tracking was activated.
The motion trace calculated from Media 2 is shown in Fig. 5. Eye motion without optical stabilization was 5.4 arcmin (~26.9 µm) RMS; after optical stabilization it was 0.48 arcmin (~2.4 µm) RMS; after both optical stabilization and digital registration it was 0.034 arcmin (~0.17 µm) RMS.It should be reiterated that blinks and motion outside the range of the tracking algorithm (ie.frame out) are not counted in the RMS values reported here, as accurate measurements were outside the capability of this method.In Fig. 5, for example, after optical tracking is turned on, the position values for the three spikes denoted by the asterisks are not counted.This is implemented by adding a second round of cross-correlation, where the stabilized images are correlated with the reference directly, with a correlation threshold of 0.85.This step is defined as 'error-proofing' which kicks out all spurious motions from the optically stabilized and digitally registered images.We found that the method also performed quite well on low contrast images; this is illustrated in Media 3.
We acquired 184 image sequences from the participants.Optical stabilization failed completely in only one trial (location: 3°, -6°) in one participant (NOR037a); in this case, the eye moved too fast at large amplitude (i.e.microsaccades were too frequent) for the operator to manually select a good reference frame.Table 2 lists the performance of optical stabilization for the remaining 183 trials.Residual RMS ranged from 0.34 to 0.53 arcmin (~1.65-2.50µm).Tracking efficiency, defined as the ratio of successfully stabilized frames to the total number of frames after tracking, was 85%, on average, and ranged from 76 to 92%, depending upon the observer.Tracking efficiency was correlated with the occurrence rate of blinks and microsaccades.The residual RMS after digital registration from all five subjects ranged from 0.04 to 0.05 arcmin (~0.20-0.25 μm).

Discussion
Our method is compared to previous methods in Table 3.To our knowledge, the combination of optical stabilization and digital registration reported here is more accurate than any other method reported in the literature.The performance of optical stabilization alone is comparable only to the optical lever technique [19,20] and is nearly 10 times better than the optical tracking performance previously reported by Hammer and colleagues in AOSLO [34].
The combined performance from optical stabilization and digital registration is ~3-4 times better than digital tracking alone, as reported by two of the authors in an earlier paper [29].Moreover, the success rate of tracking (183/184) and tracking efficiency (85%, on average) is significantly higher than that reported previously for digital stabilization alone [29].Interestingly, tracking efficiency (reported in Table 2) appeared to be directly related to imaging duration.Tracking efficiency appeared to gradually decrease as imaging duration increased.As noted above, tracking efficiency is directly correlated with the number of frames that can be tracked; we are unable to track blinks and large amplitude motion, such as microsaccades.It appears that this may be related to increased fatigue, as microsaccade frequency appears to decrease after participants take a short break.This phenomenon was consistent across all five subjects and warrants further investigation.
There are several changes that can be implemented to the current system to improve performance, including: increasing the accuracy of digital registration by employing sub-pixel cross-correlation, implementing real-time 'error-proofing' and incorporating automatic reference frame selection.One possible solution for sub-pixel cross correlation is to implement either the approach of Guizar-Sicairos and colleagues [42] or Mulligan [43].The 'error proofing' step is currently implemented offline, but it will be implemented in real-time by placing this computation during the retrace period of the slow scanner.Algorithmic automatic reference frame selection may be able to solve the problem for the one failed trial in which the algorithm could not be locked manually.Future work is needed to determine appropriate image metrics to use for reference frame selection.In addition, a method is needed for removing intraframe distortion from the reference frame, as these distortions will be encoded into the registered images.A solution to this problem may be possible by using multiple frames to synthesize a reference image that is free of intraframe distortions.It should be noted that torsional eye movements (rotations about the line of sight) are not corrected; this is perhaps the greatest limitation of this approach.We have calculated the image rotation for the 1.5° × 1.5° AOSLO imaging field at the gaze locations used here, and found that it was <0.05° for the 20-second video sequences we acquired.After Stevenson and Roorda [9], we calculated image rotation by dividing each long horizontal strip equally into two strips and measured the displacement of these image patches on consecutive frames.Cross-correlation was used to calculate the translation between the two strips on the left and right side of each image, defined as (x L , y L ) and (x R , y R ).The difference of (y L -y R ) was ~0.1-0.2 pixels, on average; the distance between left and right strips was 256 pixels.Therefore, the maximum torsion in a single 20-second video was: 0.2/256 × 180/π = 0.045° (~2.7 arcmin).We occasionally saw larger torsion (e.g.≥1°).In these cases the tracking algorithm simply fails, requiring either 1) a new reference frame, or 2) a delay until the eye rotates back sufficiently so that the algorithm may re-lock.
Further system improvement can be attained with the use of faster mirrors.The mechanical performance of the TTM we used decreases with increased input frequency, so stabilization performance decreases as the frequency of the motion increases.The Nyquist frequency of the tracking algorithm is equal to ½ of the strip rate, which in this implementation is 426 Hz.However, there are potential aliasing problems from the implementation of the imaging system and the no-data gap during retrace interval which have not been fully characterized and merit further study.
Additionally, it is worthwhile to implement a faster frame rate, e.g., by increasing the speed of the resonant scanner.It should be noted that we could increase the frame rate without increasing the resonant scanner speed but this would reduce the number of lines per frame, increasing the frequency of 'frame out' errors.Increasing the resonant scanner rate could substantially reduce the within frame distortions that are a major problem for clinical imaging, particularly longitudinal studies of eye disease.We are exploring the possibility of using different mirror technologies, such as polygonal or MEMS-based scanning mirrors in future systems.A polygonal mirror would obviate the need for image rectification as it would produce a linear image but we have not determined whether the mechanical stability will be sufficient for our purposes.MEMS-based mirrors are available that scan at high rates, but the small size of these mirrors pose challenges for optical design and implementation.
From Fig. 5 it can be seen that despite the improved performance demonstrated here, this tracking system still has several disadvantages: 1) it does not take full advantage of the tracking range of the tip/tilt mirror, 2) it has difficulty stabilizing fast relatively large amplitude motion (ie.microsaccades), 3) it still suffers from 'frame-out' (when the motion is greater than can be covered by the current reference frame or strip, see Appendix for details on this issue), and 4) it has difficulty resetting the position of the tracking mirror after microsaccades, blinks, and/or frame-out.

Conclusions
1) Optical stabilization can be accomplished in an AOSLO by replacing the existing 1-D slow scanner with a 2-axis tip/tilt mirror 2) Optical stabilization was successful in all but one of 183 trials.On average, 85% of all frames were successfully stabilized.
4) Digital registration of residual RMS eye motion after optical stabilization was accurate to ~0.04-0.05arcmin (~0.20-0.25µm)5) Tracking efficiency decreased as imaging duration increased, likely reflecting an increase in the microsaccade and blink rate with fatigue.

FFT cross-correlation
To implement image-based eye motion calculation, we employed the widely used FFT-based cross-correlation.FFT-based cross-correlation was chosen for motion tracking because it has been proven to be robust and successful [34][35][36], and can be implemented for fast execution on specialized processors such as GPUs, Digital Signal Processors (DSP), or FPGAs.For convenience of coding and maintenance, we offloaded all computation intensive eye motion calculations to the GPU utilized NVIDIA Compute Unified Device Architecture (CUDA) technologies such as CFFT, shared memory and textures to speed up data processing.
For the motion tracking algorithm to run efficiently in real-time, an appropriate balance must be found between computational speed and robustness.Our goal is to make the algorithm sufficiently fast to run in real-time while still generating accurate measurements of retinal image motion.The computational cost of FFT cross-correlation can be analyzed by examining each step of a single calculation, shown in Eqs. ( 7)- (11), To minimize computational cost for cross-correlation, we reduce both the number of FFT operations per iteration and the size of the image.Each time we perform a new crosscorrelation, we only perform two FFT operations instead of three.This is accomplished by storing in memory all FFTs that are required for future computations, such as all of the FFTs from the reference image (which are computed and stored only once, just before the tracking algorithm is activated).In addition, we the smallest image size (N) possible for each stage of the algorithm (as described below).

Small amplitude motion tracking
Figure 6 shows three frames from an AOSLO image sequence illustrating typical small amplitude eye motion for a healthy normal observer.To obtain motion measurements at a frequency greater than the frame rate, which is required for real-time optical stabilization and digital image registration, we divide each frame into multiple strips along the orientation of the slow AOSLO scan.In the example shown in Fig. 6, the fast scan is horizontal and the slow scan is vertical.Individual strips are denoted as the red rectangles: k, k+1, k+2, and k+3.When the motion between frames is small, such as between frames F r and F n in Fig. 6, crosscorrelation between two strips with the same index (e.g.strips k+1) returns the translation of that strip.However, when the amplitude of the motion is large, there are cases when there is no overlap between any pair of strips with the same index from the reference frame F r and frame F n+1 .In practice, we have found that the tracking algorithm requires ~32 lines (ie.two data acquisition strips of 16 pixels each) for a robust cross-correlation result for our typical AOSLO images.It should be noted that the minimum image size for robust cross-correlation is highly application dependent; we have found that this height is sufficient for images of the photoreceptor mosaic from multiple AOSLO instruments.More work is needed to determine the absolute minimum size as well as the appropriate size for images of other retinal layers.
Due to the nature of the scanning system and the strip level data acquisition scheme, the tracking algorithm is more susceptible to 'frame-out' when motion is orthogonal to the fast scan axis.Frame-out occurs when an acquired strip falls outside of the reference frame.Each strip is much smaller in the slow direction (ie.32 pixels high vs. 512 pixels wide); therefore a much smaller amount of motion will bring a strip outside the range of the corresponding comparison strip on the reference frame.Fig. 6.Small amplitude motion is calculated by comparing strips of data between consecutive frames.This works well when the motion between frames is small (such as between frame F r and F n .However, this fails when the between frame motion is large (such as between frame F r and frame F n + 1 ).
To increase the probability of having sufficient overlap when using a small strip size, we first calculate the full frame motion between the previously acquired frame and the reference frame.This is illustrated in Fig. 7, where F r is the reference frame, F n is the frame whose eye motion has just been detected, and Frame F n+1 is the target frame for strip motion calculation.The frame motion (X n,c , Y n,c ) between frames F r and F n is computed after all of frame F n has been acquired but before the first strip of frame F n+1 is received.In the AOSLO system, this computation is conveniently placed during the retrace period of the slow scanner.
Fr Fn+1 (Xn,c, Yn,c) ) is applied before calculating strip motion to increase the probability that strips on the current frame will be compared with the appropriate overlapping strips on the reference frame (F r )

Fr
Fn-1 Fn Fn+1 (Xn,c, Yn,c) (Xn-1,c, Yn-1,c) Fig. 8.The computational cost of the frame offset (X n,c ,Y n,c ) calculation is reduced by using only the central portion of the frame (denoted by the shaded region).
Computational cost is reduced for calculating the frame motion (X n,c , Y n,c ) by using only the central portion of the frame for cross-correlation.We typically use a portion of the frame that is twice the height of a single strip, as illustrated by the shaded region of the reference frame (F r ) shown in Fig. 8.As mentioned previously, the cross-correlation calculation is also reduced to only 2 FFT calculations here, as the FFT for this sub region of the reference frame was calculated when it was selected as the reference frame.To calculate the frame motion of the current target frame (F n ), the algorithm crops a patch of the same size from frame F n , but with the offset between the previous frame and the reference frame applied (X n-1,c , Y n-1,c ).Frame motion is measured differently for large amplitude motion detection, as described in section 6.3, below.

Large amplitude motion and blink detection
Large amplitude motion and blinks are detected using the same FFT-based cross-correlation algorithm.However, in this case we calculate the motion (dX n ,dY n ) between consecutive frames and use strips from the same image location (ie.strip k is always compared to strip k, etc.), as illustrated in Fig. 9. Large amplitude motion is considered to be detected when the relative motion is greater than a user-specified threshold and a blink is considered to be detected when the correlation coefficient drops below a user specified threshold.The userdefined thresholds may vary from subject to subject, and from system to system, but for the purposes of this study we used thresholds of: motion greater than 30 pixels and correlation coefficients less than 0.2-0.3.We use this low correlation coefficient threshold for two reasons: 1) in many clinical imaging situations, particularly with diseased eyes, images have very low contrast and high noise and do not produce correlation coefficients greater than 0.5-0.6,even between two consecutive frames, and 2) as discussed before, we use sparse matrices instead of full matrices for cross correlation.The image size used for this stage is typically the same as that used for the frame motion calculation described previously (ie.twice the small amplitude motion strip size, or 64 pixels in this case).These values can be adjusted to tolerate more or less error as required for the particular experiment or application.To reduce computational cost in our real-time system, the algorithm only looks at the first four pairs of strips from the two consecutive frames.These four pairs of strips usually cover about the first half of each frame (e.g.here we use 4 strips of 64 lines each, so 256/576 or ~44% of the frame).As stated in section 2.2 above, the PC threads for detecting large and small amplitude motion run simultaneously.When any strip pair reports large amplitude motion or a blink, the algorithm immediately stops calculating small amplitude motion for the rest of the strips from the current target frame and starts the 're-locking' procedure (outlined in section 6.4, below).It should be noted that this approach may exclude some 'good' data strips from the current target frame, but it was implemented in this way so as to free up sufficient processing power for the computationally costly re-locking procedure.r t Frame n Frame n+1 Fig. 9. Large amplitude motion and blink detection computes motion between consecutive frames using strips from the same frame position (denoted by the darker shading).

Re-locking after large amplitude motion and blinks
In order to re-lock eye position after large amplitude motion or a blink, the algorithm increases the cross-correlation image size to the entire frame.This allows it to cover the largest eye motion possible with FFT based cross-correlation, thus increasing the probability that it will re-lock, but it comes at a huge computational cost due to the increased image size.
To reduce the image size in this implementation, we downsample the image to half its size, either by sampling alternative pixels or binning 2×2 pixels to 1 pixel.This reduces computational complexity, but also reduces computational accuracy to 2 pixels.Downsampling could be 3×3 or more depending upon the particular application but will further reduce accuracy.The algorithm will continue to cross-correlate downsampled full frames until the crosscorrelation coefficient rises back above a user specified threshold.When this happens, the algorithm returns a frame motion (X n ,Y n ) between the current frame, F n , and the reference frame, F r .However, this alone is insufficient to consider the tracking algorithm to be relocked.The algorithm will then use (X n ,Y n ) as an offset to calculate the motion (X n+1 ,Y n+1 ) between the reference frame and the next frame (F n+1 ).Simultaneously, the algorithm reenters the large amplitude & blink detection stage of processing to calculate the motion (dX n+1 , dY n+1 ) of the central patch between consecutive frames F n and F n+1 .The algorithm then computes the difference between ((X n+1 ,Y n+1 )-(X n ,Y n )) and (dX n+1 , dY n+1 ).If this value is less than a user-defined threshold (typically 50% of the small amplitude motion strip height or 32 pixels, in this case) then the algorithm has successfully re-locked.Otherwise it will continue calculating full frame cross-correlations until it re-locks, is stopped, or a new reference frame is selected.After re-locking, the frame motion (X n,c , Y n,c ) is used to coarsely orient the frames so that small amplitude, fine motion calculations can resume.

Fig. 1 .
Fig. 1.Two axes of the PI TTM (dark shaded square) showing its full range of motion (lightly shaded diamond).The dashed rectangle encloses the area of the stabilization range that can be utilized in the optical system.The small dark square shows the size of a 1.5° × 1.5° AOSLO imaging field.Scale bar is one degree.

Fig. 3 .
Fig. 3. Strip-level data acquisition, buffering, and eye motion detection.The duration of the longest latencies (T 1 ,T 3 ,T 4 & T 5 ) are denoted by the brackets; arrows denote the end of each latency.Note that T 2 and T 6 are extremely short; their durations are denoted by the thickness of the labeled arrows.

Fig. 5 .
Fig. 5. Eye motion trace computed from the image sequence shown in Media 2, showing vertical (y) component of eye motion before (red) and after optical stabilization alone (blue) and optical stabilization combined with digital registration (green).Inset shows zoomed in trace for the region denoted by the dashed rectangle.Asterisks denote spurious motion measurements during blinks or large amplitude motion (see Appendix for details).