Optical modelling of accommodative light field display system and prediction of human eye responses

The spatio-angular resolution of a light field (LF) display is a crucial factor for delivering adequate spatial image quality and eliciting an accommodation response. Previous studies have modelled retinal image formation with an LF display and evaluated whether accommodation would be evoked correctly. The models were mostly based on ray-tracing and a schematic eye model, which pose computational complexity and inaccurately represent the human eye population's behaviour. We propose an efficient wave-optics-based framework to model the human eye and a general LF display. With the model, we simulated the retinal point spread function (PSF) of a point rendered by an LF display at various depths to characterise the retinal image quality. Additionally, accommodation responses to rendered LF images were estimated by computing the visual Strehl ratio based on the optical transfer function (VSOTF) from the PSFs. We assumed an ideal LF display that had an infinite spatial resolution and was free from optical aberrations in the simulation. We tested images rendered at 0--4 dioptres of depths having angular resolutions of up to 4x4 viewpoints within a pupil. The simulation predicted small and constant accommodation errors, which contradict the findings of previous studies. An evaluation of the optical resolution of the rendered retinal image suggested a trade-off between the maximum resolution achievable and the depth range of a rendered image where in-focus resolution is kept high. The proposed framework can be used to evaluate the upper bound of the optical performance of an LF display for realistically aberrated eyes, which may help to find an optimal spatio-angular resolution required to render a high quality 3D scene.


Introduction
Accommodation is the mechanism of adjusting the refractive power of the crystalline lens in the eye. A blur pattern in a retinal image drives accommodation so that the retinal image is 'sharp' or 'best focused' about the fixated object. Accommodation is neurally coupled to binocular vergence, which is a function that rotates the two eyes in opposite directions simultaneously to fuse a visual target at a particular depth into a single image; therefore, an accommodation response is evoked even by a stimulus that elicits only a vergence response and vice versa [1][2][3]. In a natural visual environment, this neural link supports quick and robust responses to visual targets so that a viewer can maintain a stable perception of 3D scenes. When observing a stereoscopic image rendered by a conventional 3D display, however, the two signals that drive accommodation may conflict. This is due to the blur pattern on the retina making the eye always focus at the optical distance to the visual target, including a conventional 3D display's surface, while vergence may widely vary depending on the rendered image depths and where the viewer fixates. This is known as the vergence-accommodation conflict, which causes visual discomfort and fatigue and may also hinder visual performance in some tasks [4,5]. Therefore, developing 3D displays that can provide focus information has become an important research topic. In the upper diagram, the eye accommodates at the display surface (indicated as the CDP; see Section 2.1 for its definition), whereas the eye accommodates closer than the display surface in the lower diagram. B) An LF display that renders a point at a depth closer than the display with four rays in a pupil (only two are shown), PSFs, and simulated retinal images. In the upper diagram, the eye accommodates at the display surface. In the lower diagram, the eye accommodates at the rendered depth.
Light field (LF) displays modulate the position and direction of light rays, aiming to support all visual depth cues including binocular disparity, motion parallax, and focus information. This is attempted mainly by rendering a dense set of directional rays with a sufficiently high spatial and angular resolution. Fig. 1 illustrates the basic mechanism of providing focus information by an LF display. When the viewer observes a conventional display (Fig. 1A), a focused and sharp retinal image is obtained only when the eye focuses on the display surface; otherwise the retinal image is blurred due to defocus. In other words, the focus information provided by the conventional display drives accommodation only at the display depth. An LF display, on the other hand, may provide more accurate focus information by rendering images with multiple rays that hit the eye's pupil, as shown in Fig. 1B. The rays' directions are controlled so that the rays intersect at the rendered point, imitating natural rays that would emanate from the point. When the eye focuses on the display surface, the individual images formed on the retina by the rays are optically focused; however, their superposition results in a separated or duplicated image that is expected to be recognised as a defocus blur by the visual system. On the other hand, when the eye focuses on the rendered point, the individual images by the rays are optically defocused but overlap perfectly. If the visual system recognises the latter as more 'focused' than the former, the correct focus information about the rendered point is provided and accommodation should be driven to the rendered depth. However, the number of rays needed to drive accommodation correctly is not self-evident.
The idea of making accommodative LF displays was first proposed by Kajiki et al. in 1995 and validated by ray-tracing a simple optical model [6]. Since then, several researchers have implemented the idea in various prototype displays and evaluated their performance in experiments, either by capturing a rendered image with a camera or measuring actual human viewers' accommodation responses [7][8][9][10][11][12]. All these studies reported that accommodation responses were elicited by observing the rendered images, but the responses were not as close to the rendered depths as the responses to real objects. One reason for this may have been the low viewpoint density these prototypes achieved, which compromised three horizontally aligned viewpoints in the pupil at most.
Alternatively to works experimenting with display prototypes, numerous theoretical studies have aimed at developing the analytical evaluation of factors affecting the optical performance of accommodative LF displays. Huang and Hua proposed a framework to model an LF display and a human eye to simulate retinal images [13]. They analytically described the formation of the retinal image and simulated it by ray-tracing a geometrical optical model that comprised the display part and the eye part. Optical resolution on the retina and the predicted accommodation responses were then evaluated by comparing simulated PSFs and modulation transfer functions (MTFs). They predicted accommodation responses considering an integral imaging display that creates 4-16 viewpoints in the pupil of a 3-mm diameter. The predicted responses appeared shifted towards the central depth plane (CDP: to be rigorously defined in Section 2.1) by approximately 0.1-0.2 dioptres. Higher viewpoint densities resulted in smaller predicted shifts. The model was improved further by taking into account the positional sampling and finite pixel resolution in the CDP [14], by generalising it for computational multi-layer LF displays [15], and by unifying the framework to serve both integral imaging displays and computational multi-layer displays [16]. Qin et al. proposed a model aimed at simulating retinal images rendered by a microlens-array-based near-eye LF display, taking into account off-axis ray propagation and using a rigorous calculation of diffraction [17]. Retinal images were simulated by ray-tracing a geometrical optical model that comprised pixels, microlenses, and the Arizona eye model; they later reported predicted accommodation responses to images rendered by that type of LF display [18,19]. Their model had a fixed viewpoint density of 45 viewpoints over the 4-mm diameter pupil and varying CDP depth. The predicted accommodation responses did not generally match the rendered image depths, and significant shifts of the predicted accommodation responses toward the CDP depth were observed.
The methods reviewed above pose certain problems and issues that prevent their direct and systematic use for characterising LF displays and predicting the accommodation response to rendered images. First, these methods are based on ray-tracing, which is computationally inefficient and not suitable for generalisations. Second, the Arizona eye model has been commonly used as the required geometrical optical model of the human eye. This single model replicates chromatic effects and the 'average' aberration of human eyes, but it does not represent the peculiarities of the population of human eyes. A handy customised eye model for this purpose is not available [20]. Third, previous studies have not properly modelled and evaluated the accommodation responses for the case of polychromatic light, simulating instead either monochromatic retinal images or coarse spectral sampling. However, polychromatic effects are potentially important, since the human eye suffers from longitudinal chromatic aberration (LCA) greatly and the visual system makes use of it to drive accommodation [21,22].
The prediction of accommodation responses in previous studies may have also been less accurate because of the prediction measures used. These have been based either on the Strehl ratio or on values picked from through-focus MTFs at specific spatial frequencies; the latter does not give a single prediction of the accommodation error but different accommodation errors at different frequencies.
On the other hand, the Strehl ratio, while a useful metric of optical performance for slightly aberrated optics, has demonstrated relatively poor performance as an accommodation response predictor [23].
The primary purpose of the current study was to develop a novel framework to model an LF display and a human eye that does not require ray-tracing and can represent the population of aberrated human eyes. Secondarily, the study was aimed at simulating the retinal images rendered by an LF display so that (1) the accommodation responses to the images would be predicted and (2) optical resolution of the focused retinal images would be evaluated.

Modelling and evaluation of LF displays
LF displays are essentially aimed at reproducing the spatio-angular intensity distribution of light in a 3D scene. An LF propagating in the half space is usually represented by a four-dimensional function ( , , , ), where two parallel ( , ) and ( , ) planes parametrise the intensity variations along the spatial and angular dimensions of the LF, respectively [24]. Different LF display technologies set these two planes differently. In multi-perspective projection-based LF displays, the two planes are associated with the plane where the projectors are located and the screen plane where the rays recombine [25]. In integral imaging and super multiview displays, a light source plane and a direction-dependent light modulation plane form a pair of planes. In the current study, a microlens-array-based integral imaging display was modelled as a representative LF display technique, in line with other related studies, which have considered the same display principle [13,14,16,17]. Other LF display techniques can be also investigated by matching their optical structure and light-modulation principle with the corresponding LF parametrisation [13,16]. Fig. 2 illustrates the retinal image formation process for an integral-imaging-based LF display. The process begins in the rendering plane which represents a physical source of light. For multiview and integral imaging displays, the rendering plane corresponds to the LCD or OLED screen behind the microlenses. The modulation plane in front of the rendering plane controls the directions of the emitted light with an optical structure such as a lenslet or a parallax barrier. Light emanating from a point source in the rendering plane forms a beam after being spatially limited and typically refracted in the modulation plane. The central depth plane (CDP) is the optical conjugate of the rendering plane that is imaged by the modulation plane. Light rays that emanate from the rendering plane and are modulated on the modulation plane intersect in a plane that includes the rendered 3D image to render a 3D image point, imitating ideal rays that would emanate from that point. Therefore observers are expected to accommodate at this point (see Fig. 1). Each beam of light from a point source in the rendering plane then reaches the eye pupil plane, which is a hypothetical plane that models the optical modulation in the eye, and it forms a subaperture. The beam may be cropped by the pupil and is refracted, forming the image of the point source on the retina, namely the elemental PSF. The final retinal image of the rendered point is the incoherent superposition of the elemental PSFs formed by all rays that enter the eye through the pupil. The number of rays that hit the pupil, which is identically the number of subapertures, is the parameter of the greatest interest in evaluating an accommodative LF display.

Retinal image formation
The position, shape, and size of the subapertures are determined by the setup of the rendering plane and the modulation plane. Let and be the depth of a rendered point and the depth of the modulation plane, respectively ( Fig. 2A). The positional interval of subapertures [Δ , Δ ] in Fig. 2B is geometrically linked to the sampling interval in the modulation plane [Δ , Δ ] by assuming infinitely high resolution on the rendering plane (( , ) plane), which is equivalent to imposing no positional constraint in determining the locations of the beam emitters. The shape and size of each subaperture is dependent on the modulation structure of the modulation plane. Assuming the origin of the pupil plane is at the centre of the pupil, the amplitude function on the -th subaperture where [ ( , )] denotes the Fresnel diffraction integral [26] as a convolution between an input field ( , ) and the convolution kernel for propagation distance , ( , ) represents the modulation function of the modulation structure for the corresponding beam, ( − , − ) denotes the Dirac delta function at [ , ] modelling the point source in the rendering plane, and 0 and refer to the distance from the rendering plane to the modulation plane and the distance from the modulation plane to the eye pupil plane. The modulation function ( , ) is a simple aperture function or a slit function for displays that modulate light with pinholes or a parallax barrier. For an integral imaging display that controls light's directions by lenslets, the modulation function includes the phase modulation term corresponding to a microlens, which is a chirp function.
In the current study, a hypothetically ideal modulation in the modulation plane was assumed. That is to say, the effects of diffraction and aberration due to the optical modulation structure were neglected, and a point source in the rendering plane was assumed to be imaged in the CDP as an ideal point so that the light sources on the rendering plane can equivalently be treated as ideal point sources in the CDP. This assumption separates the effects of a display's optical structure from the modelled retinal image formation process. Such a model gives ideal simulation results, which however apply to all LF displays that share the same critical parameters such as ray density and CDP depth, and provide the theoretical maximum of the LF displays' performance. In our simulations, a subaperture elem was geometrically determined instead of being derived rigorously from Eq. 2 as shown schematically in Fig. 2A. We first defined three viewpoint densities for the simulation, namely 2×2, 3×3, and 4×4 viewpoints in a 3-mm diameter pupil. To simulate retinal images for a viewpoint density selected from these, the size and positions of the subapertures were determined from the viewpoint density as illustrated in Fig. 2B. Specifically, the subapertures were set to circular functions that were tangential to each other on a rectangular grid in the eye pupil plane filling the pupil; thus, the diameter of the subapertures was equal to the intervals of the subapertures Δ and Δ .
Under the assumption of a point source at the CDP and the paraxial approximation, an elemental PSF due to a given beam is obtained by Fourier-transforming the corresponding generalised pupil function, which is nonzero within the corresponding subaperture, as illustrated in Fig. 3A [26]. Specifically, the elemental PSF for the -th subaperture at a wavelength for an eye with a nominal accommodation distance acc , namely ePSF( , ; , , acc ), is described as where F [·] denotes the Fourier transformation, sub ( , ; , , acc ) is the generalised pupil function of the -th subaperture defined over the whole eye pupil, and eye is the optical distance from the eye pupil plane and the retina. The generalised pupil function is defined as where sub ( , ; , ) is the aperture function that describes the amplitude modulation and W ( , ; , acc ) is the wavefront aberration function that represents the phase modulation. The generalised pupil function, the aperture function, and the wavefront aberration function are all defined not only in the subaperture but in the whole eye pupil plane. The aperture function sub ( , ; , ) incorporates the amplitude variation of the incident light beam within the corresponding subaperture elem ( , ; ) and a Gaussian apodisation filter in the whole eye pupil plane SCE ( , ; ) = 10 − ( 2 + 2 ) , which models the effect that light passing near the centre of the pupil stimulates cones much more strongly than light passing near the edge of the pupil, referred to as the Stiles-Crawford effect [27,28]. The peak of the effectiveness of the entering light is assumed to be at the centre of the pupil (namely the origin in the , -plane), and the wavelength-dependent parameter denotes the peakedness of the effect [29]. The resulting amplitude modulation can then be written as where pupil ( , ) represents the binary circular pupil function pupil ( , ) = circ for a pupil with diameter . The wavefront aberration function ( , ; , acc ) incorporates all possible wavefront aberrations compared to the diffraction-limited imaging condition, including not only the fixed aberrations of the eye but also accommodation-dependent defocus and spherical aberration terms. The composition of the wavefront aberration function is further elaborated in Section 2.1.1.
The polychromatic PSF ePSF poly ( , ; , acc ) for each subaperture is found by superposing the elemental PSFs at different wavelengths as The monochromatic PSFs are weighted by a luminosity function ( ) to reflect the spectral visual effectiveness [30]. In the current study, the CIE physiologically relevant 2-degree luminosity function for photopic vision was used as ( ) [31], principally because only foveal vision was under consideration. An elemental PSF calculated in this way represents the PSF of the on-axis point source (see Fig. 3A). To obtain the retinal PSF of a rendered point, the elemental PSFs are laterally shifted on the retina reflecting the rendered image depth , the CDP depth CDP , and eye (see Fig. 3B). The amount of shift Δs is determined as where [ , ] is the position of the -th subaperture's centre. Finally, laterally shifted polychromatic elemental PSFs are superposed and the retinal PSF of the rendered point under a nominal accommodation distance acc is obtained as

Eye wavefront aberrations
The wavefront aberration function ( , ; , acc ) in Eq. 4 includes statistics-based monochromatic aberrations of an unaccommodated eye, the LCA as a wavelength-dependent defocus, the accommodation-related defocus, and the accommodation-dependent additional spherical aberration. A wavefront aberration function ( , ) can be described by a Zernike expansion that is a weighted sum of Zernike polynomials as where ( , ) is a Zernike polynomial, is the corresponding Zernike coefficient, and and ( , ∈Z; ≥| |≥0) denote the highest order of the polynomial's radial component and the azimuthal frequency of the polynomial's sinusoidal component, respectively [32]. With a Zernike expansion, any wavefront aberration function is represented by a set of the Zernike coefficients.
A wavefront aberration function includes defocus of the optical system, which is zero if the system satisfies the imaging condition. In other words, a non-zero defocus means a signed longitudinal shift of the nominal focal point from the CDP depth. Given the nominal accommodation distance acc is provided in metres, the defocus in dioptres acc ( acc ) is found by the following equation: where CDP is the CDP depth in metres. A dioptric defocus acc ( acc ) contributes to the Zernike defocus term 0 2 ( , ) of the wavefront aberration function. The corresponding coefficient is found through the following equation: where is pupil diameter in millimetres and is defocus in dioptres [33,34]. Note that acc ( acc ) is in micrometres.
Accommodation primarily changes the shape of the crystalline lens, resulting in an increase in its refractive power and also a systematic decrease in the primary spherical aberration while the other types of aberration statistically show little change with accommodation [22,[35][36][37]. Cheng et al. measured changes of aberrations with accommodation and reported that the change in spherical aberration was proportional to the change in accommodation defined by the Zernike defocus [36]. Specifically, the reported slope of the change in the Zernike coefficient for primary spherical aberration Human eyes are considerably aberrated, and it greatly affects the quality of the retinal image. Therefore, modelling the human eye with simple ideal optics must over-estimate the retinal resolution. Furthermore, idealising the human eye in the model may also cause an incorrect prediction of the accommodation responses. This is because the empirical accommodation response as a function of the stimulus's optical distance is widely known to be S-shaped, while the response function of the idealised (diffraction-limited) eye model should be simply the identical line [22,38,39]. This phenomenon, namely the lead and lag of accommodation, is considered to be due to ocular aberrations and pupil constriction [40][41][42]. Similarly, simulating a retinal image for one eye that has the average aberration of the human eye population may also give misleading results. This is because aberrations of the human eye population tend to distribute around zero and thus the hypothetical 'average' eye is nearly free of aberrations [43]. In other words, the 'average' eye does not represent the human eye population.
To address this issue, we generated multiple instances of an aberrated eye that follow the statistics of measured human eye aberration. Simulated retinal images in these instances, which truly represent the human eye population, were evaluated. To generate corresponding sets of aberration coefficients, we used the statistical model of the aberration of the healthy human eye population by Thibos et al. [44]. In the model, the coefficients for individual eyes are represented as multivariate Gaussian variables with the measured mean and variance of each coefficient and the covariances between all possible pairs of the coefficients. In this study, coefficients representing ten virtual aberrated eyes were generated from the given vector of the mean values of the coefficients and the variance-covariance matrix [45]. Fig. 4 shows the resulting Zernike coefficients of the generated virtual eyes and the mean and ±2 standard deviation of each coefficient described in the statistical model. The mean coefficient values are close to zero for most of the Zernike modes, while the coefficient values are much larger for the generated eyes in absolute value.
The chromatic aberrations, especially the LCA, also significantly affects the retinal image. The LCA causes the wavelength-dependent defocus of light -specifically, shorter wavelengths of light images at points closer to the lens and vice versa. The defocus due to the LCA is always present in natural viewing, and the visual system utilises it to drive accommodation [21,22]. In contrast to monochromatic aberrations in the human eye population, the profile of the LCA is mostly common across individuals [46,47]. The LCA is measured in dioptres and modelled at a wavelength by where = 633.26 and = 214.10 [46]. The parameter ref is a reference wavelength in nanometres, at which the retinal image is assumed to be nominally focused. The contribution of the LCA to the defocus term 0 2 ( , ) can be found by converting ( ) to micrometres using Eq. 11.

Prediction of accommodation response
The metrics for the prediction of accommodation responses incorporate not only the retinal image itself but also the subsequent neural factors, reflecting the fact that a retinal defocus blur is detected, processed, and analysed in the visual processing pathway from the retina to the cortex to decide the accommodation response. The visual Strehl ratio is an objective metric that involves the neural transfer function and is known to be one of the best reliable predictors of subjective focus and visual acuity [23,33,[48][49][50][51]. While the Strehl ratio is defined as the ratio of a PSF's maximum to that of the diffraction-limited PSF, the visual Strehl ratio is computed in the frequency domain (namely the optical transfer function or the modulation transfer function), where it is weighted by a neural factor. In our setting, the visual Strehl ratio computed from the optical transfer function (VSOTF) is given by where OTF poly is the given OTF, OTF DL is the OTF of the diffraction-limited optics, and NCSF is the neural contrast sensitivity function [33]. Specifically, OTF poly ( , ; acc ) = F PSF poly ( , ; acc ) , and where PSF polyDL ( , ) is defined as the special retinal PSF for the eye whose monochromatic aberration coefficients are all zero and is in-focus at the CDP ( acc = CDP ) while the accommodation-related spherical aberration and the wavelength-dependent defocus due to the LCA are present for the sake of consistency in the calculation of polychromatic PSFs. The NCSF is conceptually the neuro-retinal portion of the contrast sensitivity function (CSF), assuming that the CSF is the product of the eye's MTF and the NCSF as shown in Fig. 5B [52]. The NCSF is obtained either by measuring the contrast sensitivity to interference fringes that are directly formed on the retina [53] or by dividing the CSF by the MTF [52]. We used the NCSF model derived by Watson and Ahumada as shown in Fig. 5A [54,55]. The NCSF model incorporates the oblique effect, which is the phenomenon that a typical human observer is less sensitive to the oblique gratings than to the horizontal or vertical gratings.

Simulation results
In this section, (1) predictions of accommodation response to LF images and (2) the optical resolution of the images are reported for a wide range of depths rendered by an integral-imaging-based LF display with various viewpoint densities. Through-focus analysis was employed to predict the accommodation response, and optical resolutions on the retina at various conditions were compared between each of them at the best focused state found by the through-focus analysis.
The CDP depth of the LF display model was fixed at 2 dioptres. The three viewpoint densities, i.e. 2×2, 3×3, and 4×4 viewpoints within the eye pupil of a 3-mm diameter, were tested. The rendered image depth was one of nine depths in front of and behind the CDP, namely −2.0D, −1.5D, . . . , 0D, . . . , or +2.0D relative to the CDP depth. The intervals of the nominal accommodation distances for the through-focus analysis were set to be 0.2D. The spectrum of visible light was sampled at 400 nm and 10 nm increments up to 700 nm. We set the reference wavelength at 550 nm, at which the luminosity function approximately reaches its peak [56]. The whole simulation was implemented by customising the ISETbio toolbox [57] on MATLAB.  Fig. 6 shows an example of the through-focus analyses we conducted. In this case, an analysis for one open pupil of an aberrated eye is shown, thus the rendered image depth was only the CDP depth. PSFs were first computed for several accommodation states that had varying nominal accommodation distance acc in front of and behind the image depth and the CDP depth. The dioptric interval of the nominal accommodation distances was 0.2D. The frequency domain of PSFs, namely OTFs, were then computed and finally VSOTF values were obtained for each OTF. The nominal accommodation distance at which the VSOTF takes the peak value is the predicted accommodation response. Therefore, in Fig. 6, an accommodation response is predicted at +0.2D relative to the image depth. This analysis was repeated for all combinations of the eyes, viewpoint densities, and rendered image depths.

Natural view
Firstly, for reference, we conducted the simulation for the case of natural view, where no subaperture was set and thus the 'rendered' image depth was only the CDP depth. Fig. 7 shows the through-focus VSOTFs calculated for the aberrated eyes and the average eye; their peak positions are shown as vertical lines, which indicates the predicted accommodation depths for the eyes. The aberrated eyes had generally different through-focus response profiles from each other (the grey curves), but none of the ten eyes was predicted to focus at a depth apart from the image depth (= the CDP depth) by more than 0.2D. To see the average response of the aberrated eyes, the average position of the peaks for the aberrated eyes is shown with the vertical red line. The average response was close to 0D with a slight negative shift in dioptres. It means the eyes were, on average, predicted to focus at a depth close to the CDP when observing a point on the CDP naturally. The through-focus VSOTF for the average eye (the blue curve) was much greater than that for the aberrated eyes around its peak and almost reaching the value of 1, suggesting an eye with the average aberration is very close to the diffraction-limited eye when it is in focus. The predicted accommodation response for the average eye was, however, essentially the same as the average response of the aberrated eyes.

Rendered 3D image
Next, we simulated retinal PSFs of images rendered at a wide range of depths relative to the CDP, and accommodation responses to the rendered image were predicted by the through-focus analysis of VSOTF. The simulation was conducted for three configurations where 2×2, 3×3, or 4×4 viewpoints were in the pupil of 3 mm diameter. Fig. 8 shows the through-focus plots of VSOTF for each image depth and view density. The mean predicted responses of the aberrated eyes were close to the rendered image depth for all viewpoint densities and image depths. In other words, the accommodation error, i.e. the dioptric difference between the image depth and the predicted accommodation distance, was predicted to be small regardless of the rendered image depth and the viewpoint density. The accommodation errors tended to be small but non-zero and constantly negative. This means that the aberrated eyes were predicted to focus at depths slightly farther than the image depth regardless of the rendered depth of the image.
The tendency of the negative errors may be due to the effect of the spherical aberration in the eye. That is to say, focusing at slightly farther than the nominally in-focus depth may improve the retinal image quality, or increase VSOTF in this case, under presence of the spherical aberration [40][41][42]. The constant accommodation errors across the image depths contradict the previously reported simulation results in which the accommodation error was reported to increase as the depth difference between the rendered image and the CDP grows.
The accommodation responses predicted for the average eye (the blue vertical lines in Fig. 8) were very close to the average of the predicted accommodation responses for the aberrated eyes (the red vertical lines in the same figure). This similarity suggests that a through-focus analysis of VSOTF on the average eye may be sufficient to predict accommodation responses, although the average eye is very close to a diffraction-limited system and thus poorly represents aberrated real eyes.
In summary, the accommodation responses were predicted to be close to the rendered image depths, and the predicted accommodation errors were roughly constant across the image depths. In addition to that, increasing the viewpoint density did not affect the accommodation errors, at least in the configurations we tested. It can straightforwardly be interpreted that even the lowest viewpoint Data are plotted in the same manner as Fig. 7. The green vertical lines indicate the CDP depth. Note that some parts of the through-focus ranges are missing in some panels on the top because we assumed the eyes would not accommodate hyperopically, namely its absolute accommodation distance should not be 'farther than infinity'. density we tested, namely 2×2 viewpoints in the 3-mm diameter pupil, is fairly enough to elicit an accommodation response as correctly as in the natural view. These observations generally contradict the findings that have previously been reported [13,16,18,19]. Huang and Hua predicted smaller accommodation errors in configurations with higher viewpoint densities [13], and Qin et al. reported that systematic accommodation errors towards the CDP depth were predicted from through-focus analyses of the Strehl ratio, even though a large number of viewpoints (45 viewpoints over a 4-mm pupil) was assumed in their simulation [18,19]. In the current study, however, the predicted accommodation responses did not show any clear difference between the viewpoint densities, and the predicted accommodation errors were small and constant across the rendered image depths.
There are two possible points that may explain the difference. One is the effect of display diffraction and aberration, which were ignored in the simulation in the current study but included in the simulations in all previous studies. Including them into the model must always make the elemental PSFs more blurred, since a point source at the CDP is an idealised simplification of the image of a point source on the rendering plane. Nevertheless, it does not directly imply that including the effect of display diffraction and aberration into the simulation makes the results closer to these in the previous studies. The other is the prediction methods of accommodation responses. In the current study, VSOTF was used as the predictor metric for accommodation response, while the previous studies used through-focus Strehl ratios and values picked from through-focus MTFs.
We also analysed through-focus Strehl ratios for the viewpoint densities and rendered image depths (Fig. 9). The accommodation responses predicted from through-focus Strehl ratios were essentially identical to those from the through-focus VSOTF except the cases for the viewpoint density of 2×2 viewpoints in the pupil. That is to say, the through-focus Strehl ratio predicted that the accommodation responses would be close to the image depths rather than the CDP depth in the cases for the viewpoint densities of 3×3 or 4×4 viewpoints in a 3-mm pupil. For the viewpoint density of 2×2 viewpoints in the pupil, however, the results were largely different from the other viewpoint densities. Specifically, the predicted accommodation depths were close to the image depths only if the rendered images were in a depth range of ±1.0D relative to the CDP; otherwise, the predicted accommodation responses were close to the CDP rather than the image depths. The tendency of small negative accommodation errors was observed in the through-focus Strehl ratio as well as in the through-focus VSOTF except in the cases where the image depths were apart from the CDP by more than 1.0D for the viewpoint densities of 2×2 viewpoints in the pupil. The predicted accommodation responses on the average eye were very similar to those on the aberrated eyes, as observed also in the through-focus analysis of VSOTF. Fig. 10 shows an example case in which the predictions from the through-focus VSOTF and Strehl ratio were largely different; VSOTF predicted the accommodation response around the image depth, while the Strehl ratio predicted the response at the CDP depth. When the eye accommodated at the image depth, the PSF spread widely and thus the simulated retinal image was blurred. On the other hand, when the CDP was accommodated, the PSF consisted of four separate and clear peaks, hence the retinal image appeared as a superposition of four clear images with positional differences, which eventually looked hardly recognisable. Since only the peak amplitude of a PSF defines the Strehl ratio, its highest value seemed to be obtained in the case when the CDP was accommodated. There is much evidence to support the higher validity of VSOTF over the Strehl ratio as a predictor of accommodation response, but all of these studies, to the best of our knowledge, tested it for eyes with a natural pupil. Hence, the validity of VSOTF is not perfectly supported in predicting the accommodation response from 'irregular' PSFs that are rendered by an LF display. However, this also applies to other metrics such as the Strehl ratio. Considering that the Strehl ratio is usually used only for optical systems with little aberration, the Strehl ratio may not be a suitable metric to predict the accommodation response from PSFs rendered by an LF display.

Optical resolution of rendered image
The two-dimensional MTFs for the in-focus accommodation states predicted by the VSOTF were averaged across meridians, providing radial MTFs [43,58], so that the MTFs were visualised and compared with each other more simply. Fig. 11 shows the obtained radial MTFs, which describe the optical resolution on the retina for each rendered image depth and viewpoint density. The mean MTF for the aberrated eyes and the MTF for the average eye are separately plotted. The MTFs for the average eye were better than the mean MTFs for the aberrated eyes in almost all cases. This indicates that the simulation on the eye model with the average aberration overestimates retinal optical resolution.
In general, better MTFs were simulated for the cases with the smaller depth difference between the rendered image and the CDP, especially when the viewpoint density was low. This tendency was less clear for the cases with higher viewpoint densities. Specifically, for the cases with the lowest viewpoint density -namely 2×2 viewpoints in the pupil -the optical resolution drastically dropped as the depth difference between the rendered image and the CDP grew, even though the retinal image was focused. It can be interpreted that the depth of field (DOF) of the LF display, or the range of a rendered depth where the resolution of the in-focus retinal image is kept high, is shallow for a configuration with a low viewpoint density. On the other hand, with the high viewpoint density, the drop of the in-focus optical resolution related to the rendered image depth relative to the CDP was less distinctive. In other words, a wider DOF was simulated for a configuration with a high viewpoint density.
To visualise the relation between the viewpoint density and the optical resolution on the retina, we set the reference contrast gain of 0.05 as a threshold and plotted the cut-off frequencies for the three viewpoint densities (Fig. 12). We chose the reference gain of 0.05 because the contrast gain dropped to around that value on average in the aberrated eyes at 60 CPD, which is dogmatically the highest spatial frequency resolvable for human observers (Fig. 12A). Fig. 12B shows the cut-off frequencies for the three viewpoint densities as functions of the image depth relative to the CDP. As already described, a configuration with the low viewpoint density (2×2 viewpoints) was predicted to achieve a high resolution, but the DOF was small. On the other hand, a configuration with a high viewpoint density (4×4 viewpoints) can achieve low resolution at best, but the DOF was greater than the lower viewpoint densities. The best optical resolution, which was achieved when the image was rendered at the CDP depth, was better in the cases with the low viewpoint density. A drop of the maximum resolution with increasing viewpoint density was also reported previously [13], but the drop observed in the current study was not as drastic as in their report.

Summary
We simulated through-focus PSFs for a wide range of the image depths and the viewpoint densities of 2×2, 3×3, or 4×4 viewpoints in a 3-mm diameter pupil assuming no effect of display diffraction and aberration. The accommodation responses to the rendered images were predicted by the through-focus analysis of VSOTF, and the optical resolution on the retina was also assessed.
The predicted accommodation responses were close to the rendered image depths even for the cases where the image depth was apart from the CDP depth. Importantly, the predicted accommodation errors were constantly small across the viewpoint densities. This means that even the lowest viewpoint density we tested -namely 2×2 viewpoints in the pupil -may be enough to elicit correct accommodation to rendered images at a wide range of depths if the LF display had the ideal optics so that the effects of its diffraction and aberration were negligible.
The analysis of the optical resolution on the retina showed the advantage of a wide DOF for the LF display with a high viewpoint density. In other words, increasing the viewpoint density extended the range of a rendered image depth in which the best retinal image can be kept.
To sum up, under the assumption of the absence or negligible effects of display diffraction and aberration, it was suggested that the viewpoint density may not be needed to be that high to elicit the correct accommodation responses to rendered images; however, it was also inferred that a low viewpoint density may limit the range of a rendered depth in which the in-focus resolution is kept high on the retina.

Limitations of the study
The current study is limited principally at several points. The most important limitation is the absence of display diffraction and aberration. As discussed earlier, ignoring these effects and assuming an ideal point source at the CDP must overestimate the optical resolution on the retina. Hence, the simulation results estimate the retinal image resolution only for an LF display with little effects of diffraction and aberration, which seems to be hardly achievable. Simultaneously, it can be inferred that no real LF displays can achieve a retinal image resolution that is better than the results simulated in the proposed way. On the other hand, the effects of display diffraction and aberration on accommodation response cannot be simply guessed from the logic that including them would make the elemental PSFs less sharp, because the visual system always reacts to their superposition.
Another limitation is the absence of pixels in the model. In the current study, the point source was assumed to be infinitesimal and positional sampling was ignored. However, in a practical LF display, the image resolution is always regulated by the display's pixel sampling. Pixel sampling strongly affects the image resolution a viewer may observe in combination with the rendered depth [59,60]. Specifically, the pixel sampling determines the depth range in which the best pixel resolution can be obtained for the viewer. A further study is needed to integrate the optical effects of diffraction and aberration in a display and human eye into the discussion of the pixel resolution and the rendered depth, although an initial attempt has been made by Huang and Hua by extending their study [14].
The last important limitation to be pointed out is the paraxial approximation. The calculation of PSFs in the proposed framework gives an accurate simulation only in the paraxial region. In other words, the calculation cannot simply be used to assess the optical image on the peripheral retina, which is formed by off-axis light rays. In addition to that, because the retina is not spatially uniform physiologically and perceptually, the functional role of the non-foveal or peripheral region in accommodation must be different from that of the foveal region [61]. Therefore, involving the functional role of the non-foveal or peripheral retina in accommodation may require not only an appropriate calculation method for the retinal image, but also an evaluation method that correctly reflects the characteristics of the peripheral visual field.

Conclusion
In the current study, we proposed a novel simulation framework to model an LF display and an observer's eye to simulate the retinal image, which is totally free from ray-tracing. The proposed framework has the advantages of (1) being free from ray-tracing, thus computationally efficient; (2) having the capability to include realistic aberration patterns of the human eye population; and (3) ensuring the rigorous modelling of chromatic effects in the visual system.
In addition to that, an optical metric that is known to predict accommodation response well was used to assess whether the accommodation would be elicited correctly. The simulation based on the proposed model showed that accommodation was expected to be elicited fairly close to the rendered depths even in a configuration with a relatively low viewpoint density, e.g. 2×2 views within a 3 mm pupil. However, increasing the viewpoint density seemed to extend the depth range of the rendered image in which in-focus retinal resolution is kept high.

Disclosures
The authors declare no conflicts of interest.

Data Availability
Data underlying the results presented in this paper are available in Refs. [44,45].