Automatic detection of cone photoreceptors in split detector adaptive optics scanning light ophthalmoscope images

Quantitative analysis of the cone photoreceptor mosaic in the living retina is potentially useful for early diagnosis and prognosis of many ocular diseases. Non-confocal split detector based adaptive optics scanning light ophthalmoscope (AOSLO) imaging reveals the cone photoreceptor inner segment mosaics often not visualized on confocal AOSLO imaging. Despite recent advances in automated cone segmentation algorithms for confocal AOSLO imagery, quantitative analysis of split detector AOSLO images is currently a time-consuming manual process. In this paper, we present the fully automatic adaptive filtering and local detection (AFLD) method for detecting cones in split detector AOSLO images. We validated our algorithm on 80 images from 10 subjects, showing an overall mean Dice’s coefficient of 0.95 (standard deviation 0.03), when comparing our AFLD algorithm to an expert grader. This is comparable to the inter-observer Dice’s coefficient of 0.94 (standard deviation 0.04). To the best of our knowledge, this is the first validated, fully-automated segmentation method which has been applied to split detector AOSLO images.


Introduction
A multitude of retinal diseases result in the death or degeneration of photoreceptor cells. As such, the ability to visualize and quantify changes in the photoreceptor mosaic could be useful for the diagnosis and prognosis of these diseases, or for studying treatment efficacy. The use of adaptive optics (AO) to correct for the monochromatic aberrations of the eye has given multiple ophthalmic imaging systems the ability to visualize photoreceptors in vivo [1][2][3][4][5][6][7][8], with the AO scanning light ophthalmoscope (AOSLO) [2] being the most common of these technologies. AO ophthalmoscopes have been utilized to quantify and characterize various aspects of the normal photoreceptor mosaic [1,[9][10][11][12][13][14][15], and also in the analysis of mosaics affected by various retinal diseases [16][17][18][19][20][21][22]. The resolution advantage of confocal AOSLO enables the visualization of the smallest photoreceptors in the retina -rods and foveal cones [12]. However, confocal AOSLO photoreceptor imaging, which is thought to rely on intact outer segment structure to propagate the waveguided signal, occasionally provides ambiguity in identifying retinal structures [23]. For example, in the confocal AOSLO image of the perifoveal retina shown in Fig. 1(a), it is challenging to clearly distinguish cones from rods. To address this issue, and inspired by earlier works on enhanced visualization of retinal vasculature using multiple-scattered light [24], non-confocal split detector AOSLO has been developed [25]. Split detector AOSLO reveals the cone inner segment mosaic, even in conditions where the waveguided signal in the accompanying confocal image is diminished [25]. Moreover, as shown in Fig. 1(b), split detector AOSLO images allow unambiguous identification of cones in the normal perifoveal retina. Fig. 1. Comparison of peripheral rod and cone visualization in confocal and split detector AOSLO imaging in a normal subject (a) Confocal AOSLO image at 10° from the fovea shows rod and cone structures, but it is challenging to confidently differentiate a cluster of rods from a single cone with a complex waveguide signal. (b) Split detector image from the same location better visualizes cones but is incapable of visualizing rods. Circle indicates a cone, which is clearly visible in split detector AOSLO but appears as a collection of rod-like structures in confocal AOSLO. Scale bar: 20 μm.
Identification of individual photoreceptors is often a required step in quantitative analysis of the photoreceptor mosaic. Since manual grading is subjective (with low repeatability for split detector and confocal AOSLO images [26]), and is often too costly and time consuming for clinical applications, multiple automated algorithms have been developed for detecting cones in AO images [27][28][29][30][31][32][33][34][35][36][37]. However, given the distinctive appearance of cones in split detector AOSLO images, one cannot simply transfer an existing algorithm from other AO modalities (e.g. confocal AOSLO, AO flood illumination, or AO optical coherence tomography) to split detector AOSLO without modifications. Thus, quantitative analysis of photoreceptor mosaics visualized on split detector AOSLO currently requires manual grading.
To address this problem, we present a fully-automated algorithm for the detection of cones in non-confocal split detector based AOSLO images. We validate this algorithm against the gold standard of manual marking at a variety of locations within the human retina.

Methods
We denote our proposed automated cone segmentation algorithm the adaptive filtering and local detection (AFLD) method. This two-step algorithm first estimates the radius of Yellott's ring [38,39], which is an annular structure that appears in the Fourier transform of cone photoreceptor images, with a radius that coarsely corresponds to the modal cone spatial frequency [40]. The estimated frequency is then used to create a Fourier domain adaptive filter to remove information non-pertinent to the cones. In the second step, a priori information about the appearance of cones in split detector AOSLO images is used to aid in detection of cones. The schematic in Fig. 2 outlines the core steps in our algorithm, which are discussed in the following subsections. In what follows in this section, we first describe the image acquisition and pre-processing procedures in section 2.1. The two steps of the AFLD algorithm are explained in sections 2.2 and 2.3, respectively. Finally, section 2.4 describes the validation process used for evaluating the performance of the algorithm.

Image acquisition and pre-processing
This research followed the tenets of the Declaration of Helsinki, and was approved by the institutional review boards at the Medical College of Wisconsin (MCW) and Duke University. Image sets from 14 subjects with normal vision were obtained from the Advanced Ocular Imaging Program image bank (MCW, Milwaukee, Wisconsin). In addition, images from 2 subjects with congenital achromatopsia and 2 subjects with oculocutaneous albinism were obtained. All images were acquired using a previously described split detector AOSLO [7,25], with a 1.0° field of view.
Axial length measurements were obtained from all subjects using an IOL Master (Carl Zeiss Meditec, Dublin, CA). To convert from image pixels to retinal distance (μm), images of a Ronchi ruling with a known spacing were used to determine the conversion between image pixels and degrees. An adjusted axial length method [41] was then used to approximate the retinal magnification factor (in µm/degree) and calculate the µm/pixel of each image.
We extracted eight split detector images of the photoreceptor mosaic from each subject's data set within a single randomly-chosen meridian (superior, temporal, inferior, nasal) at multiple eccentricities (range: 500-2,800 µm). An ROI containing approximately 100 photoreceptors was extracted from each image, and intensity values were normalized to stretch between 0 and 255.

Fourier domain adaptive filtering
Cone mosaics can be well approximated by the band pass component of their Fourier domain representation [38,39] (Fig. 3). Therefore, removing low and high frequency components in the non-confocal split detector based AOSLO images of cones would reduce noise and improve image contrast. Thus, in the first step of the AFLD algorithm, we estimated the modal spatial frequency of the cones within the split detector AOSLO images in order to set the cutoff frequencies of a preprocessing band pass filter. This is similar to the method proposed in [42], where the modal frequency is estimated to directly calculate cone density in confocal AOSLO images. Here, we devised an alternate method for estimating the modal frequencies to account for the differences in the Fourier spectra between split detector and confocal AOSLO.
In this step, we transformed the image into the frequency domain using the discrete Fourier transform. Next, we applied a log 10 transformation to the absolute value of the Fourier transformed image, resulting in a frequency domain image with a roughly circular band corresponding to the spatial frequency of the cones in the original image ( Fig. 3(b)). We then applied a 7 × 7 pixel uniform averaging filter to this image to reduce noise.
In the next step, we estimated the 1-dimensional modal spatial frequency of the cones, corresponding to the radius of Yellott's ring within the Fourier image. We averaged nine equally spaced slices through the center of the frequency space to get a 1-dimensional curve (black line in Fig. 3(c)) with a prominent peak corresponding to the modal cone spatial frequency. Note that due to the split detector orientation, the bulk of the spectral power for the cone component of the image is near the horizontal axis. Thus, the selected nine slices were spaced angularly from 20 to 20 degrees around the horizontal axis. As can be seen in Fig.  3(c), the resulting curve consists of a peak corresponding to cone information, along with a gradually decreasing distribution. We found that removing this underlying distribution prior to estimating the modal spatial frequency improved the reliability of locating the peak corresponding to the cone information and the final accuracy of the algorithm, even though removing the distribution leads to shifting the peak to slightly higher frequencies. We estimated the underlying distribution using the least-squares fit of a sum of two exponentials to the data (red curve in Fig. 3(c)), and then subtracted the fit from the curve (Fig. 3(d)). Next, we used MATLAB's findpeaks function to find the peak corresponding to the modal cone spatial frequency. We chose the first prominent peak within the frequency range of 0.04 to 0.16 pixels 1 . This range corresponds on average to cone densities between 5400 and 88000 cones/mm 2 , which was chosen to encapsulate the range of cone densities seen in healthy eyes at the eccentricities examined [43]. To find the final estimate of the modal cone spatial frequency ( C f ), we found the center of mass of the upper fourth of the peak (Fig. 3(e)). We used the estimate of the cone spatial frequency to create a binary annular band pass filter (Fig. 3(f) 0.04 C f  and 0.025 pixels 1 , respectively. The hard lower bound was used to ensure adequate low frequency information was removed. We then multiplied this filter by the Fourier transformed cone photoreceptor image. We inverse Fourier transformed the resulting filtered image to the space domain ( Fig. 3(g)), which has much of the non-cone low and high frequency fluctuations (Fig. 3(h)) removed. We used no additional windowing beyond what was described for the forward and inverse Fourier transforms.

Cone detection
The second step in the AFLD method is to find the location of cones within the filtered image. In split detector images, individual cones appear as pairs of horizontally separated dark and bright regions (Fig. 4(a)). The relative shift of these two regions is constant throughout an image and depends on the orientation of the split detection aperture (all images in this study had dark regions to the left of light regions). Thus, to improve detection accuracy, we exploited this a priori information on the paired local minima and maxima manifestation of cones in split detector AOSLO images. First, we found the location and intensity values of all local minima and maxima in the filtered image ( Fig. 4(b)). Next, we paired the minima and maxima together (Fig. 4(c)) following the constraints that 1) maxima must be to the right of minima, 2) the points must be within 1 1.5 C f  weighted distance of each other, 3) for each maximum only the pair with the smallest weighted distance may be used, and 4) each minimum is only included in one pair.
The weighted distance is defined as , which has higher weight on the vertical component in order to prioritize horizontal matches. We used the inverse of the modal frequency to limit the search regions as it is related to the size and spacing of cones. To find these opposing extrema pairs, we used the convex hull function [44] to find the closest (using w d ) minima for each maxima under the given constraints. To make sure the matches are one-to-one, we checked each paired minima to see if it is paired with multiple maxima, and if so, we kept only the pair with the smallest distance between points. In order to satisfy the above constraints, some maxima may have no corresponding minimum pair. We recorded these unpaired maxima to be analyzed as well.
The final step is to use the matched extrema pairs to determine the locations of cones. We calculated the average intensity magnitude for each opposing extrema pair as   /2 p maxima minima I I I  . We then used thresholding to determine whether an extrema pair corresponded to a cone in the image. The threshold T varied for different modal cone frequencies and for each image was defined as: where  is the standard deviation of the band pass filtered image. We accepted all extrema pairs with p IT  as corresponding to cones, and the location of each cone was defined as the average of the minimum and maximum's location (Fig. 4(d)). We chose the threshold values in Eq. (1) empirically through training based on the observation that there was tendency to overestimate the number of cones at low modal frequencies and underestimate the number of cones at high frequencies. As such, the threshold values are generally higher at higher eccentricities and lower at lower eccentricities from the fovea. These thresholds were set based on a training data set from subjects which were not included in the testing set. Additionally, we evaluated the maxima that were not paired in the same way using maxima I instead of p I , with the threshold for the same frequency bounds set to 2.1σ, 1.75σ, or 1.4σ, respectively. Naturally, cones found using the maxima alone have their location set to be at the maximum's position. Only three percent of cones found across the validation data set were from the maxima alone.

Measures of performance and validation
Subjects with normal vision were divided into groups for training and validation. Subjects from the training and validation data sets did not overlap. In order to form the training group, one subject from each of the four meridians was randomly selected. All 8 images from each subject were used, totaling 32 images for algorithm training. All parameters used in implementing the algorithm were set based on this training. An expert grader performed the manual evaluations used in this process. We then used the 80 images from the remaining 10 subjects for performance evaluation of the algorithm.
We computed standard measures of performance for the AFLD algorithm with respect to the gold-standard of manual marking. Here, two expert manual graders independently evaluated the validation set; the first was the expert used in training the algorithm, and a second grader was also engaged. Thus, for each eye there were data from the AFLD algorithm and each of the graders. Two approaches were taken. First, we focused on the sensitivity and precision of the AFLD method. We analyzed the spatial distributions within each image of the cones identified by AFLD and by the manual assessments. We identified the overlaps in these cones (identified by both methods) and the sources (AFLD or manual) of the non-overlapping cones. This was based on a nearest neighbor analysis. This enabled determination of sensitivity (true positive rate) and precision (1-false discovery rate). Then, we considered measures of the total numbers of cones detected per image, expressed as the cone density, and contrasted them across the AFLD and manual results from both graders on a per image basis for the validation data set.
For the sensitivity/precision analysis, we focused on identifying cones in each image that both ALFD and the manual grading detected, those that AFLD failed to detect and those which it falsely detected. We found one-to-one pairs between the automatic and manual locations with the following constraints: 1) there are no restrictions on the orientation with respect to each other, 2) the two locations must be within 0.75 med d of each other (where med d is the median cone spacing from manual marking for the image), 3) for each manual marking only the pair with the smallest distance is used, and 4) each automatic marking is only included in one pair. We estimated med d for an image by finding the distance for each manually marked cone to its nearest neighbor in pixels and then taking the median of these measurements. To remove border artifacts, we did not consider the cones located within 7 pixels of the edge of the image. The border value was chosen to correspond to half the average value of med d across our data set. We denote the number of cones detected by both AFLD and manual as N True Positive , by manual with no corresponding automatic as N False Negative , and by automatic with no corresponding manual as N False Positive . The numbers of cones from both the automatic and manual markings are then expressed as: In order to compare results of the automatic and manual approaches, we calculated the true positive rate, false discovery rate, and Dice's coefficient [45,46] as: Dice's coefficient is a common metric for describing the overall similarity between two sets of observations, and is affected by both the true and false positives. Additionally, we calculated the same metrics comparing the two graders to one another. We used paired twosided Wilcoxon signed rank tests in order to analyze significances of differences for each of these three summary metrics. In order to compare the total numbers of cones found in images without considering their spatial locations, we calculated the cone density, defined as the ratio of the number of cones marked in an image to the total area of the image. This was compared across AFLD and both graders: grader #1 vs. AFLD; grader #2 vs. AFLD; grader #1 and grader #2 average (per image) vs. AFLD; and grader #1 vs grader #2. Two complementary approaches were taken. First, linear regression and correlation analyses were conducted. Then, Bland-Altman analyses were performed [47]. We coded the fully automated AFLD algorithm in MATLAB (The MathWorks, Natick, MA). The algorithm was run on a desktop computer with an i7-5930K CPU at 3.5 GHz and 64 GB of RAM. Parallelization across 6 cores with hyper threading was used. The mean run time for the AFLD algorithm was 0.03 seconds per image across the validation data set (average image size of 206.5 by 206.5 pixels). This included time for reading the images and saving results.

Cone detection in healthy eyes
Across the validation data set, the AFLD algorithm found over 10500 cones in total. Figure 5 is a representative segmentation result in three images from different subjects and acquired from different eccentricities. Note the differences in cone spacing, and image quality. Figure 6 illustrates results for the AFLD and first manual marking for three images. In the set of marked images, matched pairs between AFLD and manual are shown in green for automatic and yellow for manual (true positives). Cones missed by AFLD are shown in cyan (false negatives), and locations marked by AFLD but not by the manual grader are shown in red (false positives).
Tables 1 and 2 summarize results of the sensitivity/precision analysis of the AFLD algorithm. Quantitatively, the true positive rates and Dice's coefficients are relatively high. The false discovery rates are relatively low (albeit with notably greater variability across all the images for all methods). Table 1 summarizes the performance of the AFLD algorithm and grader #2, while using grader #1 as the gold standard. No significant differences were found between AFLD and the second manual grader for the true positive rates (p = 0.96), false discovery rates (p = 0.18), and Dice's coefficients (p = 0.06). Table 2 shows analogous contrasts, now using grader #2 as the gold standard and evaluating performance of grader #1 as well as AFLD. Here, there were statistically significant differences in the true positive rates (p = 0.048), false discovery rates (p<0.001), and Dice's coefficients (p<0.001), even though their absolute differences per image were relatively small.   Figure 7 gives a set of scatter plots of results for the cone density per image. These plots contrast manual grader #1 vs. AFLD ( Fig. 7(a)), manual grader #2 vs. AFLD ( Fig. 7(b)), the average per image of the two graders vs. AFLD (Fig. 7(c)), and manual grader #1 vs. manual grader #2 (Fig. 7(d)). All plots could be modeled as linear (p<0.001), with relatively small confidence intervals about the regression lines. The slopes of all regression lines in Fig. 7 were significantly different from unity (p<0.001), and the intercepts of all lines were significantly different from zero (p<0.001).  Figure 8 gives Bland-Altman plots for the same contrasts as Fig. 7. The solid line is the average difference per method and the dotted lines are 95% confidence limits. Notably, for cone densities below about 2.25 × 10 4 cones/mm 2 , differences are within the confidence limit intervals. Above that density, there is more scatter between the two manual graders as well as in the comparisons with AFLD. This is consistent with the scatter in Fig. 7.

Preliminary cone detection in pathologic eyes
Section 3.1 described our detailed quantitative analysis of AFLD performance for detecting cones on split detector AOSLO images of normal eyes, which is the main goal of this paper. As a demonstrative example, we tested our algorithm on four subjects with diseases that affects photoreceptors. Figure 9 below shows the segmentation results for two subjects with albinism ( Fig. 9(a)-9(b)) and two subjects with achromatopsia ( Fig. 9(c)-9(d)). In these experiments, we implemented the AFLD algorithm as is without any modification. The extension and quantitative validation of our algorithm for diseased eyes are out of the scope of this preliminary paper, and will be fully addressed in our upcoming publications.

Discussion
We developed a fully automated algorithm for localizing cones in non-confocal split detector based AOSLO images of healthy eyes. This utilizes an adaptive filter along with a priori information about the imaging modality to aid in detection. The algorithm was validated (without a need for manual adjustment of parameters, which were estimated from a separate training data set) against the current gold standard of manual segmentation. Our fast algorithm performed with a high degree of sensitivity, precision, and overall similarity as defined by the Dice's coefficient, when compared to manual grading on a large data set of images differing greatly in appearance and cone density.
There were slight differences in overall similarity (Dice's coefficient) of the AFLD algorithm and the two manual graders (Tables 1, 2). When comparing the automated marking and manual marking of the second grader to the marking of the first grader, the results of AFLD algorithm were closer to the gold standard first grader's marking (without statistical significance). Alternatively, when comparing the automated marking and manual marking of the first grader to the marking of the second grader, the results of first grader were closer to the gold standard second grader's marking (with statistical significance). This might be in part due to the fact that the algorithmic parameters were determined based on the training set marking from the first grader (the more experienced of the two). That is, our algorithm tends to mimic grader #1's marking; thus it is reasonable that the automatic results would more closely match the first grader's marking for the validation set. However, it should be emphasized that regardless of statistical significance, the Dice's coefficient difference between the automated and manual methods in each of the above two cases was on the order of 0.01, a negligible amount in practice.
The AFLD method also shows good performance in determining summary measures per image of cone density. The linear regression and correlation analyses of the cone density scatter plots of Fig. 7 demonstrate the high degree of linear correspondence between the automatic and manual methods. Correlation is higher for AFLD vs. grader #1, consistent with the discussion above. Notably, the scatter plots show less (albeit still high) correlation between the two graders. Given the large sample size (n = 80), it is not surprising that, in a formal statistical sense, the values for the slopes and intercepts of the regression lines were found to be statistically different from one and zero, respectively. However, the magnitude of these differences was sufficiently small to demonstrate the fidelity of the automatic approach. The differences between the automated and manual results are also small, as seen in the Bland-Altman plots of Fig. 8. Again, these differences are greater when AFLD is compared to grader #2 than grader #1. And again, as for the scatter plots, the differences for grader #1 vs. grader #2 are also relatively larger. The discrepancies tend to occur at higher cone densities, where the AFLD method tends to underestimate the manually determined number of cones. The smaller cones at high eccentricities are more difficult to detect even by manual graders, as illustrated in Fig. 7(d) and Fig. 8(d).
Interpretation and quantification of retinal anatomy and pathology, as based on a single ophthalmic image modality, are at times unreliable. Of course, it is not surprising that a higher resolution system such as confocal AOSLO can visualize pathology not identifiable on clinical imaging systems such as optical coherence tomography (OCT) [22]. Moreover, as shown in Fig. 1, perifoveal cones and rods can be often better identified on split detector AOSLO and confocal AOSLO, respectively. Thus, optimal quantification of rod and cone photoreceptor structures requires analysis of both confocal and split detector AOSLO images from the same subject. On another front, OCT's superior axial resolution with respect to AOSLO provides important complementary 3D information about the retinal structures. As such, a recent study recommends employing a multi-imaging modality approach (including OCT and AOSLO imaging systems) to provide additional evidence needed for confident