Design of lensless retinal scanning display with diffractive optical element

We propose a design of a retinal-scanning-based near-eye display for augmented reality. Our solution is highlighted by a laser scanning projector, a diffractive optical element, and a moist eye with gradient refractive indices. The working principles related to each component are comprehensively studied. Its key performance is summarized as follows. The field of view is 122°, angular resolution is 8.09′, diffraction efficiency is 57.6%, transmittance is 80.6%, uniformity is 0.91, luminance is 323 cd/m, modulation transfer functions are above 0.99999 at 3.71 cycle/degree, contrast ratio is 4878, and distortion is less than 24%. © 2019 Optical Society of America under the terms of the OSA Open Access Publishing Agreement


Introduction
For a very long time, near-eye display (NED), also known as head-mounted display, remained as a marginal technology in the display community [1]. Until recently, in the wake of augmented and virtual realities [2], it theatrically morphs into a superstar sought after by a crowd of researchers, engineers, investors, bloggers etc. Technically, NED is a type of wearable projection display [3]. Pursuant to the criterion that whether the retina is the image plane, NED can be divided into two categories, i.e. indirect projection and direct or retinal projection. For the former, the image is first projected onto a virtual image plane, and then received by the retina. To do so, the optical path needs to be folded via either a combiner [4][5][6][7][8][9][10][11] or waveguide [12][13][14][15][16][17][18][19][20][21][22]. As far as the field of view (FOV) is concerned, both combiner and waveguide based NEDs have limited FOVs, usually below 50° [2]. Another restriction is the paradox that FOV is inversely proportional to the exit pupil [2]. There must be a trade-off between the FOV and exit pupil. Direct or retinal projection based NED, on the other hand, refers to the case when the image is directly projected to the retina, for which the image plane coincides with the retina. Virtual retinal display (VRD) [23][24][25], developed by Human Interface Technology Lab of University of Washington, is a pioneering retinal projection based NED, known for its concept of retinal scanning. iOptik [26,27], a proprietary technology of Innovega, is characterized by a contact lens embedded with a zone plate. Pinlight display [28], co-developed by University of North Carolina at Chapel Hill and Nvidia, mimics a pinhole camera by an array of point light sources in front of eye. Intel-through acquisitions of Composyt Light Labs and Lemoptix-built a smart glasses prototype dubbed Vaunt with holographic optical elements (HOEs) [29]. Though the above retinal projection based NEDs are able to achieve extremely large FOVs, each has its own pros and cons. The optical setup of VRD is bulky and sophisticated, making it unwearable. iOptik has been struggling for many years in persuading the consumers to wear the contact lens. Pinlight display is vulnerable to the change of eye state, including the diopter of eye, pupil size, and rotation of eyeball. Lasting for only 4 odd years, Vaunt was aborted in part due to manufacturability of HOEs. Inspired from the said issues, we would like to present a lensless retinal scanning display (RSD), which is compact in design and immune to the change in the pupil size and diopter of eye. In what follows, its structure, working principles, and overall performance are to be elaborated.

Proposed structure
The proposed RSD can be decomposed into three major components, i.e. a laser scanning projector, a diffractive optical element (DOE), and an eye, as shown in Fig. 1, where d p is the distance between projector and glass, d doe the distance between projector and DOE, d lens the distance between DOE and lens center, W the horizontal dimension of DOE, x the distance starting from the left edge of DOE, θ i the angle of incident light, and θ m the angle of diffracted light. Inside the temple is housed the laser scanning projector for saving the room. On the inner surface of a flat glass substrate is fabricated the DOE, which is able to converge the light coming out of the projector towards the center of lens of eye. Analogous to the Maxwellian view [30], such configuration ensures that the image formed on the retina will remain intact no matter how the eye accommodates its diopter to the distance of object. Moreover, as long as the beam size of the laser is way smaller than the pupil-minimally 2 mm across [31]-the brightness of image could be maintained regardless of the variation of pupil size. For the sake of symmetry, the DOE and eye are center-aligned. Fig. 1. Proposed structure of our RSD, which can be decomposed into three major components, i.e. a laser scanning projector, a DOE, and an eye. d p is the distance between projector and glass, d doe the horizontal distance between projector and DOE, d lens the distance between DOE and lens center, W the dimension of DOE, x the distance starting from the left edge of DOE, θ i the angle of incident light, and θ m the angle of diffracted light.

Eye model
Unlike our previous eye models [32][33][34], in which the eye is dry-in the absence of tear-and lens of eye has a constant refractive index, the current model factors into not only the tear but also a gradient lens [35] studied by Goncharov et. al. Figure 2 is a cross-sectional view of our schematic moist eye, consisting of tear, cornea (anterior and posterior), aqueous chamber filled with aqueous humor, iris with an opening known as pupil, lens (anterior and posterior), vitreous chamber filled with vitreous humor, and retina. Approximately, two-thirds of the eye's optical power is derived from the cornea-including the tear-and one-third from the lens [36]. While the thickness of tear is negligibly thin-ranging from 6 to 20 μm-the irregularities of tear can cause significant visual aberrations and distortions [37].   To model the gradient lens, an assistive virtual plane is inserted to separate the lens into the anterior and posterior halves, as shown in Fig. 3. The center of lens is-coinciding with the nodal point [38]-designated as the origin of y-z coordinate. Invoking the Goncharov's gradient lens model [35], refractive indices n a for the anterior lens and n p for the posterior lens could be computed with respectively, where z is the distance measured from the vertex of anterior lens along the zaxis, y the distance measured from the lens center along the y-axis, z m the distance between the vertex of anterior lens and the virtual plane, n 00 the starting refractive index of anterior lens, n max the maximum refractive index, and c 10 , c 20 , c 01 , c 02 , c 03 , c 04 , c 01,2 , c 02,2 , c 03,2 , and c 04,2 the coefficients of each term. With the parameters disclosed in [35], which are itemized in Table 1, refractive indices of lens are calculated along the horizontal and vertical directions, respectively, as shown in Fig. 4. It can be seen that the refractive index maximizes at the lens center, from which it starts to decrease towards the outermost surface.  Figure 5 is a schematic drawing of the laser scanning projector, inside which are mounted a laser diode, a circular polarizer, a mirror, and a scanning mirror controlled by a biaxial microelectromechanical system (MEMS) [39]. Laser is transformed to circular polarization after passing through the circular polarizer. Since the laser scanning projector is lensless, the image projected onto the retina will be always in focus-in other words, its depth of focus is infinite-even when the eye is defocused. In this regard, the laser scanning projector is essentially a pinhole camera [40]. This is a big advantage over the lens-based projector, whose image plane is at a certain distance and depth of focus is finite [3]. The depth cue of the projected image, on that other hand, shall be coupled with the distance of real object, depending on the accommodation of eye [32]. A major downside of laser scanning projector, among others, is about its low resolution-e.g. the best resolution of Microvision's laser scanning projector is merely 848 × 480 [41]-restricted by the scanning rate of MEMS. Due to the limited choices of laser scanning projector available in the market, a set of parameters customized to our design are given in Table 2, where the resolution is 640 × 640, scanning angle (horizontal) is 17°, beam diameter at the waist D 0 is 0.4 mm, luminous flux Φ is 2 lm, and contrast ratio (CR) is 5000. Being treated as a Gaussian TEM 00 beam, beam diameter D of the laser enlarges as it propagates at a distance L from the waist, which is described as [42] 2 2

Laser scanning projector
where M 2 is the beam quality and λ is the wavelength of laser. The beam divergence θ div is therefore For M 2 = 1.1, λ = 532 nm, L = 50 mm, and D 0 = 0.4 mm, D = 0.411 mm and θ div = 0.93 mrad or 0.05°. As a rule of thumb, for a laser with a beam divergence less than 1 mrad, its beam can be approximated as the perfectly collimated one. Upon the reflection of MEMS, both the shape and size of the said laser beam will be altered. To match the shape and size of the subsequent DOE, MEMS is square in shape and 60 µm wide, as will be discussed later. To avoid the loss of light, a beam shaper could be inserted on top of the laser diode. Plus, in case the projector and DOE are misaligned during the mounting, it is suggested to make both of them adjustable for calibration. Incidentally, speaking of the speckle effect, a random laser made from disordered materials [43] is among the most desirable solutions.  Fig. 6. Profile of the slanted grating, where p is the grating period, h g the grating depth, w g the grating width, β the slant angle relative to the normal, θ i the incident angle, and θ m the diffraction angle.

Simulation setting
The performance of our RSD is quantitatively analyzed with Code V (Synopsys) and COMSOL Multiphysics (COMSOL). Capable of ray tracing, Code V lends itself to analyzing the imaging properties, such as modulation transfer function (MTF), distortion, and imaging simulation. Based on the finite element method [46], COMSOL Multiphysics is a powerful tool in dealing with the diffraction grating. The design wavelength is 532 nm. Figure 7 outlines the optical surfaces used in Code V. The object is placed at 3 m ahead of the eye. Surfaces 1 to 8 (S1 to S8) constitute the moist eye, of which, S1 is tear, S2 anterior cornea, S3 posterior cornea, S4 iris with pupil, S5 anterior lens, S6 virtual plane, S7 posterior lens, and S8 retina. To imitate the laser scanning projector as a pinhole camera, the semiaperture of virtual plane-where the lens center is located-is set as 30 µm, i.e. radius of the laser beam. Fig. 7. Optical surfaces used in Code V. The object is placed at 3 m ahead of the eye. Surfaces 1 to 8 (S1 to S8) constitute the moist eye, of which, S1 is tear, S2 anterior cornea, S3 posterior cornea, S4 iris with pupil, S5 anterior lens, S6 virtual plane, S7 posterior lens, and S8 retina.
The original parameters of eye are adopted from [35], wherein the eye is focused to the infinity. In our model, with the tear being added and the object distance being assigned as 3 m, those parameters need to be tweaked through an optimization, which is carried out under a constraint that the length of eye be 24 mm [34]. As a result, Table 3 summarizes the optimized parameters, where AL and PL, in turn, denote the gradient refractive indices of anterior and posterior lenses. Besides, detailed parameters for aspherical surfaces and gradient lens are provided in Table 4 and Table 5, respectively.     Figure 8 shows the ray tracing diagram for the fields of 0°, 10°, 20°, 30°, 40°, 50°, and 61°, from which it can be seen that all rays are converged at the lens center.

Field of view
As illustrated in Fig. 9, FOV, whose vertex is situated at the center of entrance pupil, is the angle subtended by DOE. If measured diagonally, it could be determined by ( ) where H is the vertical dimension of DOE, d er the eye relief, and d ep the distance from the vertex of tear to the center of entrance pupil, which is calculated as 3.05 mm. Say W = H = 38.4 mm and d er = 12 mm, FOV is 122° (diagonal).

Angular resolution
Angular resolution (AR) in arcminute (′) is calculated by dividing FOV in degree (°) by the number of pixels N along the diagonal, which can be written as [ where N h and N v are the number of pixels along the horizontal and vertical directions, respectively. For FOV = 122°, N h = 640, and N v = 640, angular resolution 8.09′.

Diffraction efficiency and transmittance
In modeling the grating, wave optics module of COMSOL Multiphysics using the interface of electromagnetic wave, frequency domain is employed. The boundary condition is Floquet periodicity. The incident light is linearly polarized as transverse electric (TE) mode. The diffraction order m is + 1. The glass substrate of DOE is chosen as N-BK7 (Schott), whose refractive index is 1.5195 at 532 nm. Without loss of generality, 9 gratings are picked for simulation, as shown in Fig. 10. Say d p = 10 mm, d doe = 20 mm, and d lens = 16.73 mm, according to Eqs. (5) and (6), the incident/diffraction angles θ i /θ m can be calculated. Based on the optimization, the optimal grating parameters, DEs, and transmittance T for normal incidence (θ i = 0°) are obtained as in Table 6. The average DE and T are 57.6% and 80.6%, respectively.

Uniformity
As DE of each grating of DOE more or less differs, a figure of merit Γ to evaluate the uniformity is introduced as [20] where DE avg is the average DE, and σ is the standard deviation formulated as ( ) where n is the number of gratings of interest, and i the serial number. Calculated with DEs listed in Table 6, the uniformity Γ is 0.91.

Luminance
Luminance is a measure of the luminous intensity per unit area of light that is diffracted from the DOE within the entire FOV. and When the luminous flux of projector is 2 lm, luminance is 323 cd/m 2 or nit. Figure 11 plots the spot diagram for the fields of 0°, 10°, 20°, 30°, 40°, 50°, and 61°, from which it can be seen that the spot of each field is smaller than the Airy disk.

MTF
As shown in Fig. 12, MTFs are calculated as a function of spatial frequency in cycle/degree for the fields of 0° and 61° when the distances between DOE and eye are offset by 0 mm, 3 mm and 6 mm from the target eye relief of 12 mm, respectively. At 3.71 cycle/degree, which corresponds to the angular resolution of 8.09′, MTFs are above 0.99999 for all fields and eye relief offsets. This agrees with the foregoing statement on the laser scanning projector in analogy to a pinhole camera. This also indicates that even if the laser beams are not perfectly converged at the center of lens, the image quality will not be much affected. Fig. 12. MTFs are calculated as a function of spatial frequency in cycle/degree for the fields of 0° and 61° when the distances between DOE and eye are offset by 0 mm, 3 mm and 6 mm from the target eye relief of 12 mm, respectively. At 3.71 cycle/degree, which corresponds to the angular resolution of 8.09′, MTFs are above 0.99999 for all fields and eye relief offsets.

Contrast ratio
CR-the ratio of maximum intensity to minimum intensity [48]-can be deduced as

CR MTF CR CR CR MTF CR
where CR p is the CR of laser scanning projector. For the field of 0°, CR p = 5000, MTF = 0.99999, and CR = 4878.

Distortion
Distortion, which measures the difference between the paraxial and actual image heights, is defined as [49] where h p is the paraxial image height calculated with the first-order approximation, and h a is the actual image height. As the retina is a non-flat image surface, the chief ray shall be extended to intersect the flat paraxial image surface for calculating the image height. As can be seen in Fig. 13, the distortion is 24%, give or take.  Fig. 13. Distortion versus the field angle. For the fact that the eye is far from being an ideal imaging system, distortion is an inherent characteristic of all retinal projection based NEDs. Figure 14 shows the original image alongside the see-through retinal and projected retinal images. Compared to the original one, the projected retinal image is, while distorted, sharp and bright as a whole. In particular, the uncompromised brightness is extremely critical for the outdoors usage.

Conclusions
A lensless RSD and its working principles have been proposed. Its structure is highlighted by a laser scanning projector, a DOE, and a moist eye. To precisely model the eye, tear and gradient lens are taken into account. Based on the simulation, FOV is 122°, angular resolution is 8.09′, average DE of DOE is 57.6%, average transmittance of DOE is 80.6%, uniformity is 0.91, luminance is 323 cd/m 2 , MTFs are above 0.99999 at 3.71 cycle/degree for all fields and offsets, CR is 4878, and distortion is 24%. As opposed to other retinal projection based NEDs [32][33][34], our RSD exhibits several unique features. First, no lens-except the lens of eye-is involved. Second, the projected retinal image is focus free. If constructed as a binocular RSD, it would be inherently free of the vergence-accommodation conflict [50]. Third, DOE is used as a combiner, making the device compact in design and suitable for see-through augmented reality. Fourth, the retinal image is immune to the change in the diopter of eye and pupil size. Instead, it will be subject to the rotation of eye, thereby fixating the eye to look straight ahead. That being said, a user can still look around by rotating his/her head if the head tracking is enabled.