Design of foveated contact lens display for augmented reality

: We present a design of a contact lens display, which features an array of collimated light-emitting diodes and a contact lens, for the augmented reality. By building the infrastructure directly on top of the eye, eye is allowed to move or rotate freely without the need of exit pupil expansion nor eye tracking. The resolution of light-emitting diodes is foveated to match with the density of cones on the retina. In this manner, the total number of pixels as well as the latency of image processing can be signiﬁcantly reduced. Based on the simulation, the device performance is quantitatively analyzed. For the real image, modulation transfer functions is 0.669757 at 30 cycle/degree, contrast ratio is 5, and distortion is 10%. For the virtual image, the ﬁeld of view is 82°, best angular resolution is 0.38 (cid:48) , modulation transfer function is above 0.999999 at 30 cycle/degree, contrast ratio is 4988, and distortion is 6%.


Introduction
Why is making a good near-eye display (NED) for augmented reality (AR) so hard? A blog recently posted on LinkedIn by Daniel Wagner [1], chief technology officer of DAQRI, truly strikes a chord among the AR community. To digest this blog, there are a lot to take in. Yet we can follow a Liebig's barrel method [2] to sift out the most decisive factors that hold back the current NEDs from being good. According to the blog, there are two types of NEDs, i.e. the free-space-combiner-based and waveguide-based. For free-space-combiner-based NEDs [3][4][5][6][7][8][9][10][11][12], the trade-off between the field of view (FOV) and exit pupil is identified as its shortest stave. To have a large FOV, the exit pupil will become unacceptably small. To have a big exit pupil, either the FOV needs to be decreased or the size of the device will be increased to be even uncomfortable to wear. For waveguide-based NEDs [13][14][15][16][17][18][19][20][21][22], the good news is FOV is no longer coupled with exit pupil as the latter can be expanded via a number of techniques. The bad news is FOV will be limited by the condition of total internal reflection. For most consumers, a small FOV is definitely a deal breaker. Although not mentioned in the blog, it is necessary to be aware of another type of NEDs [23][24][25][26][27][28][29], which exploit the direct retinal projection without the need of a combiner or waveguide to form an intermediate virtual image. The minimal design aside, such displays are often associated with a fairly large FOV of more than 100°. That being said, they are not very successful in making real products. For example, virtual retinal display [23]-also known as retinal scanning display-requires that all beams be converged to the center of lens of eye so that the image would be unaffected no matter how the eye accommodates its diopter. Unfortunately, this condition can be only met if the eye does not move or rotate. Pinlight display [26]-a mimic of the pinhole camera-allows for certain eye motion. But its image is subject to diopter of eye, which is not an invariant. From the above, it can be seen that each type of NEDs has its short staves. To patch up the stave, or to build a new barrel is a question.
Customarily, NEDs are mounted on a helmet or eyewear. In this work, we attempt to shift this paradigm by building the infrastructure directly on top of the eye. Since the jargon NED is not applicable or at least inaccurate for our case, we shall re-name this type of display as contact lens display (CLD). Hereinafter, the principle as well as the potential of CLD is to be unfolded in full.

Proposed structure
The proposed CLD consists of two components, i.e. a contact lens and a collimated light-emitting diode (LED) array, as shown in Fig. 1. For the sake of symmetry, the contact lens, LED array and eye are center-aligned. Adjacent to the cornea is a thin layer of contact lens for fixing the refractive errors. On top of the contact lens is an array of LEDs, each pixel of which is able to emit a collimated beam of light towards the center of lens of eye. In this regard, our CLD is analogous to the said virtual retinal display or retinal scanning display. The major difference is that in our CLD, eye can rotate freely without losing the image, thanks to the surface tension [30] that tightly adheres the whole device to the eye. Cross-section view of the proposed contact lens display. Adjacent to the cornea is a thin layer of contact lens for fixing the refractive errors. On top of the contact lens is fabricated an array of LEDs, each pixel of which is able to emit a collimated beam of light towards the center of lens of eye. For the sake of symmetry, the contact lens, LED array and eye are center-aligned.

Photoreceptor cells
Photoreceptor cells are neuroepithelial cells found in the retina that are capable of converting the photons into biological signals [31]. There are three known types of photoreceptor cells, i.e. rods, cones, and photosensitive ganglions. Rods are extremely sensitive to the brightness, and can be triggered by even a single photon. Cones are less brightness sensitive, but can discern the colors. Photosensitive ganglions are responsible for the circadian rhythm and pupil control. The human retina contains about 120 million rods, 6 million cones, 24 to 60 thousand photosensitive ganglions. Number-wise, we are particularly interested in the number of cones as it is closely linked to the visual acuity, the ability to resolve the spatial details [31]. By retrieving the data from Curcio et al.'s study on the photoreceptor cells [32], the density of cones is plotted with respect to the eccentricity-a rotation angle ε about the axis connecting the center of fovea and the center of eye (see Fig. 2)-as shown in Fig. 3. The positive sign of eccentricity signifies the temporal side, while the negative sign the nasal side. It can be seen that cones are vastly concentrated at the fovea, whose eccentricity is up to 3.3° [33]. Between −13.6°to −21.6°at the nasal side is a photoreceptor-free region called blind spot [34].   [32]. The positive sign of eccentricity signifies the temporal side, while the negative sign the nasal side. It can be seen that cones are vastly concentrated at the fovea, whose eccentricity is up to 3.3°. Between −13.6°to −21.6°at the nasal side is a photoreceptor-free region called blind spot.

Visual acuity
Visual acuity (VA) refers to the clarity of vision, which can be quantitatively described as the reciprocal of angular resolution [35], i.e. VA = 1 angular resolution (1) Suppose two LEDs A and B on the cornea are just resolvable and coaxial with two cones on the retina, as shown in Fig. 4. Relative to the center of lens, the angle subtended by A and B is equivalent to the angle α subtended by A and B . With the assumptions that the eye is perfectly healthy having no refractive errors, retinal detachment, macular degeneration etc. and that the beam size of LEDs is small enough compared to the pupil that no diffractions would occur, then angular resolution α will be solely dependent on the density of cones ρ as where r is the radius of eyeball, ε the eccentricity, and d el the distance from the center of eye to the center of lens. With the above equations and the densities of cones given in Fig. 3, VA can be calculated as a function of eccentricity, as shown in Fig. 5. For a quick look-up, we shall divide the retina into four different regions, including fovea (0°to 3.3°), parafovea (3.3°to 5.5°), macula or perifovea (5.5°to 12.1°), and periphery (12.1°and beyond), and itemize the maximum density of cones, maximum angular resolution, and the best VA for each region, respectively, as in Table 1. Within the fovea, the best VA can be as high as 2.6. For the majority, whose vision is more or less compromised by optical and/or neural factors, the normal VA is 1.0-another version as a fraction is 20/20 vision [35]. Only a small portion, about 1% of the population, can have a VA above 2.0 [36].

Contact lens
The design of contact lens should conform to the prescription acquired from an optometrist or ophthalmologist. It is important to be aware that a prescription for contact lens is not the same as a prescription for eyeglasses. This is because the working distance of contact lens is obviously much shorter than that of eyeglasses. Say a user is nearsighted and his/her eyeglasses have a diopter P g of −3.00 m −1 for both eyes. Then, his/her contact lens shall have a diopter P c [37], i.e.
where d g is the distance between the eyeglasses and eye. When d g = 12 mm, P c = −2.896 m −1 . Treating the contact lens as a thin lens [38], we have where n is refractive index of contact lens, R 1 the radius of curvature of front surface, and R 2 the radius of curvature of back surface or anterior cornea, as shown in Fig. 6. In order to be compatible as an encapsulation layer for the LED array [39], contact lens is supposed to be air impermeable. To meet this purpose, polymethyl methacrylate (PMMA)-the raw material for the old-fashioned hard contact lens [40]-is selected as the lens material. Besides PMMA, other plastic polymers, e.g. polysiloxane (silicone) and polyethylene terephthalate (PET), might work as well [41]. Per the above rules, a contact lens can be tentatively designed with the parameters listed in Table 2, which also includes a couple of physical dimensions, such as the thickness t c , overall diameter d, optical zone diameter d o , and radius of curvature of edge R e . Figure 7 is a schematic of a collimated LED-an LED in tandem with a collimator-being sandwiched between the substrate and contact lens. For the reason that the position of LED is beyond the near point of eye, which is normally 25 cm [37], it is required that the etendue of LED be adequately small so that a clear image would be formed on the retina. To satisfy this requirement, a collimator is inserted at the end of LED to narrow down the etendue. Rather, this collimator is an optical fiber with a high-refractive-index core in the middle surrounded by a  low-refractive-index cladding. In the most extreme case-namely the so-called single-mode fiber-the light will be collimated into a straight line. This occurs when

Collimated light-emitting diode
where D c is the diameter of core, λ the wavelength, n 1 the refractive index of core, and n 2 the refractive index of cladding [38]. When λ = 532 nm, n 1 = 1.468, and n 2 = 1.460, D c < 2.655 µm.
Considering that the diameter of cone is approximately 2.7 µm, very close to the above value, a one-to-one correspondence between the LEDs and cones could be fulfilled. In fact, other than the single-mode, multimode collimator would work as well, as long as the etendue is not too big.

Field angle versus eccentricity
Picture a ray (yellow line) is incident to the contact lens at a field angle θ, then refracted towards the center of lens at an angle β, and finally hits the retina at an eccentricity ε, as shown in Fig. 8.
where β = −sin −1 (r · sin ε) 2 (r · sin ε) 2 + (r · cos ε + d el ) 2 in which d cl is the distance from the vertex of contact lens to the center of lens. For the human eye, its eccentricity ε varies from −96°(nasal) to + 80°(temporal) [32]. The corresponding range of field angle is therefore −66°(temporal) to + 81°(nasal). To avoid confusion about the plus/minus sign, it shall be noted that, due to the mirror imaging of lens, the orientation of field angle θ is opposite to that of eccentricity ε. Fig. 8. Picture a ray (yellow line) is incident to the contact lens at a field angle θ, then refracted towards the center of lens at an angle β, and finally hits the retina at an eccentricity ε. d el is the distance from the center of lens to center of eye, d cl the distance from contact lens to center of lens, r the radius of eye, and R 1 the radius of curvature of front surface of lens.

Pupil size
Pupil, a black hole in the center of iris, acts as an aperture stop to regulate the amount of light reaching the retina [42]. As is known, the size or diameter of pupil-more exactly and by default, the size of entrance pupil, which is the image of pupil formed by the cornea-is tunable in response to the ambient brightness. Plus, it will be also affected by the age. To take into account both the brightness and age, we shall modify a formula [43] to determine the pupil size D, given by where Y is the age, and L the luminance. Say the user is 24 years old and FOV = 82°. Then, we could calculate the pupil size against the luminance, as shown in Fig. 9, from which it can be seen that the pupil size ranges from 2 to 8 mm. When L = 500 cd/m 2 , D = 2.40 mm.
Incidentally, different levels of luminance will trigger different modes of vision. When L > 5 cd/m 2 , D < 3.96 mm and photopic vision takes effect, in which cones dominate [44].

Foveated LED array
The concept of foveated imaging originates from the computer graphics, which is intended to speed up the rendering of high-resolution images [45]. However, the physical resolution of the image remains intact, and an additional device to track the eye is needed [46,47]. Alternatively, this concept can be realized by re-arranging the pixels to match with the distribution of cones. Under the presumption that, upon the arrival at the retina, the collimated beam of each LED is received by one single cone only, it is legitimate to equate the number of LEDs to that of cones. Let the area of LEDs be decomposed into M rings and the width of each ring be equal to that of LED. For the i th ring, the number of LEDs N i it contains can be roughly estimated by where α i is the angular resolution on the i th ring. Hence, the total number of LEDs N is Substituting Eq. (6) into Eq. (9), the number of LEDs on each ring can be calculated as a function of field angle, as shown in Fig. 10. Interestingly, the blind spot will leave a blank area not covered by LEDs on the contact lens. It not just increases the transparency of the device, but saves quite a few pixels. Size-wise, blind spot is 1.76 mm (horizontal) by 1.92 mm (vertical)-or, in terms of field angle, 6°(horizontal) and 7°(vertical). According to Eq. (10), the number of pixels saved by blind spot is 20,000, give or take. To better appreciate the benefit of foveated pixel arrangement, the minimal number of pixels-at an aspect ratio of 16:9-required to yield an angular resolution  of 1 is computed with respect to the diagonal FOVs, as shown in Fig. 11. Take a FOV of 100°as an instance. The minimal numbers of pixels for the non-foveated and foveated displays are 15.38 and 3.20 million, respectively. The latter is merely about 1/5 of the former.

Simulation setting
The whole idea of CLD is validated through the numerical simulation on Code V (Synopsys). The design wavelength is 532 nm. Figure 12 outlines the optical surfaces defined in Code V, which are in turn (1) contact lens, (2) anterior cornea, (3) posterior cornea, (4) pupil, (5) anterior lens, (6) virtual plane, (7) posterior lens, and (8) retina. The initial parameters of eye are adapted from our previous eye model [48], in which the lens has gradient refractive indices and it is split into anterior and posterior lenses with a virtual plane being inserted in between. The total length of eye is 24 mm. For the real image, the object is positioned at 3 m ahead of the eye. For the virtual image, the object coincides with the contact lens. During the optimization, the radii of anterior and posterior lenses are set as the variables. Table 3 summarizes the as-optimized parameters for each surface. For more details, parameters for aspherical surfaces and gradient refractive indices of lens are disclosed in Table 4 and Table 5, respectively.

Field of view
Referring to Fig. 13, the following geometric relationship among FOV and other parameters can be obtained where d pl is the distance from the center of the lens to the pupil. Assigning the above parameters with the values provided in Table 6, FOV is calculated as 82°. A careful examination of Eq. (11) implies that FOV is proportional to the pupil size. As can be seen in Fig. 14, when the pupil dilates to 8 mm in diameter, FOV reaches up to 142°. Unlike the rectangular NED, our CLD is round in shape. If both are of the same FOV, the round one will indisputably have a bigger image size.

Angular resolution
For the resolution of our CLD is foveated, so is its angular resolution. If measured in arcminute ( ), the angular resolution for the i th ring of the field angle θ can be calculated with Angular resolution = 21600 sin θ N i (12) As shown in Fig. 15, the best angular resolution is 0.38 at 0°, whereas the worst is 3.11 at −41°.

Modulation transfer function
Modulation transfer function (MTF) is calculated for both real and virtual images, as shown in Fig. 16

Contrast ratio
Contrast ratio (CR) of image is defined as where CR 0 is the CR of object [49]. For the real image, CR of object can be infinitely large. For the virtual image, CR of object or LED, is set as 5000. For the field of 0°at a spatial frequency of 30 cycle/degree, CRs of real and virtual images are calculated as 5 (MTF = 0.669757) and 4988 (MTF = 0.999999), respectively.

Distortion
Distortion is defined as where h a is the height of actual image, and h p the height of paraxial image calculated with the first-order approximation [49]. As shown in Fig. 17, distortions of real and virtual images are 10% and 6%, respectively. Figure 18 shows the original image (Snellen chart [50]) alongside the real and virtual images formed on the retina. As expected, the virtual image is sharper and less distorted than the real image especially at the marginal fields, which agrees with the foregoing MTF and distortion. However, the foveated effect of the virtual image is not visible. The reason is that, in Code V and other simulation tools, the influence of photoreceptor cells on the image has yet to be factored into.  The reason is that, in Code V and other simulation tools, the influence of photoreceptor cells on the image has yet to be factored into.

Conclusions
As an alternative to the established NEDs, a foveated CLD has been proposed. To justify this concept, the working principles for each component have been explained. To evaluate its performance, numerical simulations have been carried out. For the real image, MTF is 0.669757 at 30 cycle/degree, CR is 5, and distortion is 10%. For the virtual image, FOV is 82°, best angular resolution is 0.38 , MTF is above 0.999999 at 30 cycle/degree, CR is 4988, and distortion is 6%. Compared to the retinal-projection-based NEDs, our CLD has its own pros and cons. First (pro), eye is allowed to move or rotate freely without the help of exit pupil expansion nor eye tracking. This is a big advantage over its counterparts, especially those with small exit pupils [23,26,48,51]. Second (pro), the physical resolution is foveated to match with the distribution of cones. This will significantly reduce the total number of pixels as well as the latency incurred by the image processing. Third (pro), no burden or weight on the head and no worries about head-related human factors, such as the shape of head, interpupillary distance etc. Fourth (con), the fabrication and popularization of CLD will face a bunch of challenges. Among others, the safety of CLD is arguably the number one issue. In addition, the power supply of CLD necessitates either the standalone built-in battery-e.g. glucose biofuel cell-or the wireless charging by means of electromagnetic induction [52]. Sadly, those issues will bring us back to the question raised at the beginning. Why is it so hard to make a good AR display? In Wagner's closing remarks, he expressed a kind of pessimistic mood towards the possibility of breakthroughs in the short term. Still, we believe if we keep thinking outside the box, a dream solution might be just around the corner.