Design and demonstration of a vari-focal optical see-through head-mounted display using freeform Alvarez lenses

: Alvarez lenses offer accurate and high-speed, dynamic tuning of optical power through a lateral shifting of two lens elements, making them an appealing solution to eliminate the inherent decoupling of accommodation and convergence seen in conventional stereoscopic displays. In this paper, we present a design of a compact eyepiece coupled with two lateral-shifting freeform Alvarez lenses to enable a compact, high-resolution, optical see-through head-mounted display (HMD). The proposed design is able to tune its focal depth from 0 to 3 diopters, rendering near-accurate focus cues with high image quality and a large undistorted see-through field of view (FOV). Our design utilizes an 1920x1080 color resolution organic light-emitting diode (OLED) microdisplay to achieve a >30 degree virtual diagonal FOV, with an angular resolution of <0.85 arcminutes and an average optical performance of > 0.4 contrast over the full field. We also experimentally demonstrate a fully functional benchtop prototype using mostly off-the-shelf optics.


Introduction
Conventional head-mounted displays (HMDs) lack the ability to correctly render focus cues, including accommodation and retinal blur effects, because they merely present a pair of stereoscopic images with binocular disparities and other pictorial depth cues on a fixed image plane. These displays thus force an unnatural decoupling of the accommodation and convergence cues and induce a fundamental problem referred to as the vergenceaccommodation conflict (VAC), which can lead to various visual artifacts, such as distorted depth perception and visual fatigue [1][2][3].
Several display methods have been proposed as potential solutions to the VAC problem, including but not limited to: holographic, volumetric, multifocal, light field, and vari-focal displays [4]. Each of these technologies has unique disadvantages. Holographic displays, for example, can potentially enable displays with correct focus cues and achieve compact form factor while remaining lightweight [5]. However, it is very challenging to develop full-color, high-resolution holographic displays free of artifacts such as speckle. Volumetric displays render 3D voxels of a scene occupying a physical space and thus naturally allow users to perceive correct focus cues [6], but these displays tend be extremely bulky and have low resolution with several moving parts. Multi-focal plane displays, another alternative, project several virtual focal planes discretely placed along the visual axis of the viewing space, each plane allowing for rendering nearly-correct focus cues across an extended depth volume [7][8][9][10]. This extended depth, however, often comes at the cost of both time multiplexing and large data bandwidth. In recent years, light field displays have emerged as a promising technology to correct the VAC by rendering the different directions of the light rays apparently emitted by a 3D scene and viewed from slightly different positions [11][12][13][14][15]. Light field displays can be both lightweight and compact, while using simple viewing optics in conjunction with a pinhole or lenslet array to achieve a large field of view. However, the view density of a light field display, which is defined as the number of views per unit area projected on its viewing window, is inversely correlated to the spatial resolution of the reconstructed 3D scene [16]. Therefore, tradeoffs have to be made among the key parameters of display performance, such as the spatial resolution depth of field, depth resolution, the accuracy of focusing cues, and the accommodative response errors [17].
A vari-focal plane display technology is arguably one of the easiest remedies to the VAC problem in HMDs. A vari-focal HMD dynamically adjusts the focal distance of a single-plane display by either adopting an electrically or mechanically tunable optical element or mechanically varying the distance between a microdisplay and its eyepiece so that the 2D image of a virtual object rendered by the display appears at the correct focal depth [4,18,19]. For instance, Liu et al demonstrated a vari-focal HMD prototype integrating a liquid lens for dynamic focus control and experimentally validated the effectiveness of a vari-focal display for addressing the VAC problem [20]. More recently, Dunn et al. demonstrated a vari-focal augmented reality display using deformable beamsplitter membranes [21].
A key enabling technology in a vari-focal HMD is an active optical element that is able to dynamically tune the optical power of the system at a high speed and in a large depth range (typically a few diopters) while also offering a large, clear aperture. Examples for such an active optical element include: deformable membrane mirror devices (DMMD) [22], electrowetting lenses [23], elastomer-membrane fluidic lenses [24], liquid crystal lenses [25], and digitally switchable multi-focal lenses [26]. With the new developments in manufacturing freeform surfaces as well as optical metrology to accurately measure those surfaces, Alvarez lenses, which offer accurate and high-speed dynamic tuning of optical power through the lateral shifting of two lens elements [27], have recently surfaced as an attractive method to achieve large focal ranges rapidly, while still maintaining a compact structure [28][29][30].
In this paper, we present a novel design of a high-resolution vari-focal plane optical seethrough HMD (OST-HMD) system using freeform Alvarez lenses coupled with ultra-fast high resolution piezo linear actuators. Our design is capable of rendering near-correct focus cues by providing dynamic control of the focal distance positions throughout the extended depth of field corresponding to a shift from 0 to 3 diopters at a rate of 150Hz per diopter. Our design utilizes a Sony 0.7" OLED microdisplay for the virtual display path to achieve a 30 degree diagonal FOV and 1920x1080 pixel resolution, with an optical performance greater than 20% modulation contrast at 0.81 arcmins over the full FOV. Figure 1 illustrates a schematic diagram of our proposed OST-HMD optical architect. The design can be further divided into two main groups, including an eyepiece and a tunable relay group. The eyepiece group is made up of a plane plate beamsplitter (PPB), imaging optics, and a cold mirror to create a folded, compact optical path that projects a magnified virtual image toward a user's eye pupil. The beamsplitter acts as an optical combiner to merge the light paths of the real-world view and the virtual display view. The cold mirror is used for a potential eyetracking path needed to determine the viewer's gaze direction and thus to determine the depth of eye convergence for rendering correct focus cues [31]. The tunable relay group consists of a relay system and an Alvarez lens group. The relay system is made up of two lens groups separated by the Alvarez lens group and it relays a 2D image rendered on an OLED microdisplay to form an intermediate image plane which is then projected by the eyepiece. The Alvarez lens group boxed in red line in Fig. 1 is composed of two symmetric freeform lenses that change the optical focus of the system from 0 to 3 diopters in the virtual image space with equal and opposite lateral translation. By carefully placing the Alvarez lens group at the intermediate pupil location that optically conjugates to the exit pupil of the system, we ensure that no change occurs in the chief ray angles when the optical power of the Alvarez lens group is dynamically adjusted to control the apparent focal depth of the virtual display over an extended depth range from 0 to 3 diopters in the visual space. Consequently, the HMD system is able to maintain a constant field of view and angular resolution when adjusting the focal depth of virtual display. This offers a significant advantage over designs [21] in which the tunable optical element is not optically conjugate to the exit pupil of the system, and thus the apparent field of view and the optical magnification of such systems vary in their focal depth, requiring calibration and digital correction. Our prototype design uses a 0.7" Sony full-color OLED microdisplay for the virtual display path. The Sony OLED, having an effective area of 15.36mm and 8.64mm and a pixel size of 8μm, offers a native resolution of 1920x1080 pixels and an aspect ratio of 16:9. Based on the choice of microdisplays, we further optimized a previously designed eyepiece using available stock lenses [32] to achieve a diagonal FOV of 30°, or 26.5° horizontally and 15° vertically, and an angular resolution of 0.81 arcmins per pixel, corresponding to a Nyquist frequency of 63 cycles/mm in the microdisplay space or 37 cycles/degree in the visual space. The design achieved an exit pupil diameter (EPD) of 10mm and an eye clearance distance of at least >20mm. The see-though path is composed of a single beamsplitter allowing for a very large FOV and un-aberrated optical performance. The tunable relay group consists of a 1:1 relay group with an Alvarez lens group inserted in-between. The 1:1 relay group is composed of 6 rotationally-symmetric plastic aspheric lenses, in which the 3 lenses on the left side of the Alvarez lens group are symmetric to those on the right side for significant cost reduction for fabrication. The Alarez lens group, consisting of two acrylic freeform lenses, affords the control of the optical power of the system. Its design, the main focus of this paper, will be detailed in the following paragraph. To correct residual field curvature of the system, a field correcting lens is inserted between the tunable relay group and the eyepiece group. Overall, the final lens design consists of 13 lenses, including 4 stock glass lenses (eyepiece), 6 symmetric plastic aspheric lenses (1:1 relay), 2 acrylic freeform lenses (Alvarez group), and a field correcting lens. It was optimized for 3 wavelengths, 465, 550, and 615nm with weights of 1, 2 and 1, respectively, in accordance to the dominant wavelengths of the OLED microdisplay. To balance the overall optical performance over the focal depth range of 3 diopters in the visual space, we optimized the system using 7 zoom configurations, each corresponding to a different optical power induced to the Alvarez lens group by a small lateral shift between the Alvarez lens pair (thus creating a different focal depth of the virtual display), to create a smooth transition of the optical performance throughout the extended depth of field.

Optical system design
where φ is the spherical power of the lens group, and x and y are the amounts of lateral translations in the corresponding directions. In our case, we set y = 0 because zero translation was allowed in the y direction, giving a much-reduced Eq for the spherical power: The freeform Alvarez lens group was optimized in CodeV using x-y polynomials to the 3rd order. During the optimization, 7 focal depths of the virtual display in the range from 0 to 3 diopters were sampled at an increment of 0.5 diopters and they were configured as 7 separate zooms in the CodeV. The amount of the lateral relative shift between the Alvarez lens pair along the x-direction was initialized for each zoom according to the paraxial spherical powers of the entire system and the Alvarez lens group calculated using Eq. (2), but it was set as a variable during optimization to allow compensation of non-paraxial effects. The optimization process focused on optimizing the shape of the freeform lens surface and the lateral shifts of the pair to obtain balanced performances across all the 7 sampled positions. Based on the optimized result, Fig. 3 shows the spherical power shift of the Alvarez lenses (in grey triangles) as a function of the lateral x-translation sampled in 7 focal positions. These sampled positions were then fitted to Eq. (2) and the fitted relationship was plotted in the grey solid line. The fitted coefficients for A, B, E, and G are shown in the graph. For each 1.667mm lateral shift of the freeform surfaces, the intermediate image plane is roughly shifted by 1mm toward the eyepiece, corresponding to approximately a 1-diopter shift of the virtual image plane in the visual space. Figure 3 plotted three configurations of the Alvarez lens pair corresponding to 0, 1.5, and 3 diopters of virtual display focal depths. Figure 3 further plotted the focal depth shift of the virtual image plane (i.e. the system power shift) as a function of the lateral displacement of the Alvarez lenses denoted by the orange (square) line, which demonstrates a system power shift of 3 diopters. The simulated optical performance of the virtual display was optimized and assessed over the full field of view in the display space using the polychromatic modulation transfer function (MTF) curves. Figure 4 shows the polychromatic MTF curves, evaluated with a 3-mm eye pupil, for 5 fields at three different focal depths corresponding to 0.5, 1.5 and 3 diopters in the visual space, respectively. The virtual display path over the extended depth range preserves over 20% modulation at the designed Nyquist frequency of 63 cycles/mm, corresponding to the 8μm pixel size of the OLED display. An average of 50% modulation at the frequency of 35 cycles/mm is maintained over the full field of view for the focal range from 0 to 3 diopters.
Along with the MTF, several other metrics were used to characterize the optical performance of the virtual display path, such as wave front error and spot diagram. The wave front error over the full field for the 3-diopter extended depth of field was held to under 1.5 waves. The average root mean square (RMS) spot diameter across the field at the far ends of the depth of field is about 14μm. This error is largely due to lateral chromatic aberration and astigmatism. Lateral chromatic aberration results from a lateral magnification difference for each wavelength and can be digitally corrected, much like distortion. Unfortunately, due to offaxis and non-rotationally symmetric design of the freeform Alvarez lenses, astigmatism is inherent to the optical system. This problem can potentially be reduced by further optimizing the Alvarez lens group, using high order terms. The distortion grid along with the magnification of the virtual image over the full focal range was analyzed and Fig. 5(a) through 5(c) plotted the distortion grids at the focal depths of 0.5, 1.5, and 3 diopters, respectively. The design shows <3% distortion and <1% magnification errors over the full field for the extended depth of field. This small amount of residual distortion can easily be corrected by image processing to prewarp the original image. For the proposed prototype of the varifocal OST-HMD design shown in Fig. 2, a PI M-633.4U Piezo linear stage was used as the electronic linear actuator to drive the lateral shift of the Alvarez lenses. Due to the symmetric form of the freeform Alvarez lenses, only one linear actuator is needed per eye to achieve equal and opposite translation. Figure 6(a) shows the mechanical mount of a binocular, vari-focal OST-HMD prototype fitted on an average-sized human head model, while Fig. 6(b) is an enlarged view of the Alvarez lens module with integrated linear stage to show the relative size of the linear stage with respect to lens. The overall width of the OST-HMD system is 200mm, with a depth of 95mm and an intraocular distance of 60mm. The mechanical setup utilizes one piezo actuator that is attached to a small gear that allows for the symmetric bidirectional translation of the two Alvarez lenses with a single movement. The M-633.4U stage offers a translation speed of 250mm/s, thereby producing a 50 Hz transition speed for a focal depth shift of the virtual display from 0 to 3 diopters and a 150Hz transition speed for a 1-diopter focal depth shift. In the prototyped system, the eyepiece lenses were cropped to achieve an eye clearance of >20mm and a 10mm EPD. For the mechanical design, lenses were individually aligned into a larger housing, where they were held by set screws to achieve a smaller tolerance stack as well as more compensation in the optical design, making it easier to achieve the desired maximum MTF and allow for the housing to be 3D printed. Each linear actuator has a step resolution of 100nm and a repeatability resolution of 200nm, while the 3D printed mounts had a mechanical tolerance of 100um, contributing to very little impact on the optical tolerances and MTF sensitivity.  Figure 7 shows a monocular benchtop prototype of our varifocal OST-HMD system with the light paths of the real and virtual scenes superimposed. The light path for the virtual display is highlighted with red arrows, while the light path for the real-world view is shown with blue arrows. Due to parts availability in the laboratory and budget constraints, we modified our original optical design shown in Fig. 2 and built this benchtop prototype with a WUXGA eMagin OLED microdisplay having a 9.6um pixel pitch and a Nyquist frequency of 52cycles/mm. Instead of custom-made aspheric lenses, the relay group was composed of two identical eyepieces obtained from Sony HMZ-T3 HMDs to create a double telocentric relay system. The freeform Alvarez lenses were samples provided by Dr. Rob Stevens of Adlens Ltd. and were placed at the intermediate pupil location. The lateral positional shifts of the lenses were controlled by liner translation stages. Lastly, the same eyepiece design as those used in our previous work [19] was modified to allow the see-through path to merge with the virtual scene. A Pointgrey camera, along with a 16mm-focal-length lens by Edmund Optic, was inserted at the exit pupil to replace the eye for image capture. Fig. 7. Experimental setup of a monocular benchtop prototype of a varifocal-plane OST-HMD using freeform Alvarez lenses. Fig. 8. Qualitative demonstration of focus cue rendering in our vari-focal OST-HMD benchtop prototype: (a) A virtual image (tumbling E) was rendered at 160mm (i.e. 6 diopters) with the Alvarez lens and camera focused at the same depth along with physical reference objects in the see-through paths; (b) A virtual image (tumbling E) was rendered at 3000mm (i.e. 0.33 diopters) with the Alvarez lens and camera focused at the same depth along with physical reference objects in the see-through paths. Figure 8 shows a qualitative demonstration of our prototyped benchtop varifocal OST-HMD. A virtual set of tumbling E was rendered as targets on the microdisplay as the user manipulated the focal depth of the virtual display through the interface device. The dimensions of the E letters were scaled such that they maintain the same angular size and resolution when viewed from the camera position. Therefore, the captured images of E letters displayed at different focal depths are expected to be the same size due to the benefits of constant angular magnification of our designed system. For visual references, two printed spoke targets were placed in the see-through path, one at 160mm and the other at 3000mm away from the camera. The two printed targets were also scaled properly so that they maintain the same angular size. By varying the lateral displacement of the Alvarez lens group by 3cm, the focal depth of the virtual display can be varied correspondingly from 6 to 0.33 diopters. Figure 8(a) shows the captured image of the scene with the camera focused at the depth of 160mm and Alvarez lens group focused at the same depth, while Fig. 8(b) shows the captured image of the scene with the camera focused at the depth of 3000mm and Alvarez lens group focused at the same depth. The image in Fig. 8(a) clearly shows that the printed target placed at 160mm or 6 diopters and the virtual E targets displayed at the same focal depth are sharply focused while the printed target placed at 3000mm looks blurry as expected. Similarly, the image in Fig. 8(b) clearly shows that the printed target placed at 3000mm or 0.33 diopters and the virtual E targets displayed at the same focal depth are sharply focused while the printed target placed at 160mm looks blurry as expected. A finger and hand as seen in the images were used for size and distance reference.

Conclusion
This paper presents a novel design of a vari-focal optical see-through head-mounted display system using freeform Alvarez lenses to dynamically shift the focal depth of the virtual image plane from 0 to 3 diopters. The study includes a comprehensive description of the optical design capable of making a large focal shift while still maintaining good image quality of the virtual display. Our design offers a >30° diagonal FOV and an angular resolution of 0.81 arcmins, with an optical performance of > 0.4 contrast over the full FOV at the Nyquist frequency of the display. By optimizing the system at 7 different focal positions, we were able to obtain a design of an Alvarez lens group that can tune the focal depth of the virtual display over a 3-diopter depth range by less than 6mm of lateral displacement of the Alvarez lenses. By using a highspeed Piezo linear stage, the design is able to achieve a focal shift speed of 150Hz per diopter. This paper further demonstrates a benchtop prototype of a varifocal OST-HMD system using freeform Alvarez lenses and mostly off-the-shelf optics.

Disclosures
Dr. Hong Hua has a disclosed financial interest in Magic Leap, Inc. The terms of this arrangement have been properly disclosed to The University of Arizona and reviewed by the Institutional Review Committee in accordance with its conflict of interest policies.