Design and prototype of an augmented reality display with per-pixel mutual occlusion capability

: State-of-the-art optical see-through head-mounted displays for augmented reality (AR) applications lack mutual occlusion capability, which refers to the ability to render correct light blocking relationship when merging digital and physical objects, such that the virtual views appear to be ghost-like and lack realistic appearance. In this paper, using off-the-shelf optical components, we present the design and prototype of an AR display which is capable of rendering per-pixel mutual occlusion. Our prototype utilizes a miniature organic light emitting display coupled with a liquid crystal on silicon type spatial light modulator to achieve an occlusion capable AR display offering a 30° diagonal field of view and an angular resolution of 1.24 arcminutes, with an optical performance of > 0.4 contrast over the full field at the Nquist frequency of 24.2 cycles/degree. We experimentally demonstrate a monocular prototype achieving >100:1 dynamic range in well-lighted environments.


Introduction
Augmented Reality (AR) is viewed as a transformative technology in the digital age, enabling new ways of accessing and perceiving digital information essential to our daily life.It is well embraced that the integration of AR technology with mobile computing will become as integrated as smart phones to all walks of life.A see-through head-mounted display (HMD) is one of the key enabling technologies for merging digital information with a physical scene in an AR system [1].While both video see-through and optical see-through displays have their unique advantages, optical see-through HMDs (OST-HMD) tend to be preferred when it comes to real scene resolution, viewpoint disparity, FOV and image latency [1].
Developing OST-HMDs, however, presents many technical challenges [2,3], one of which lies in the challenge of correctly rendering mutual occlusion relationships between digital and physical objects in space.Mutual occlusion is the light blocking behavior when intermixing virtual and real objects-an opaque virtual object should appear to be fully opaque and occlude a real object located behind it and a real object should naturally occlude the view of a virtual object located behind the real one.There are two types of occlusion: that of real-scene objects occluding virtual ones, and of virtual objects occluding the real scene.While the occlusion of a virtual object by a real object can be achieved straightforwardly, by simply not rendering the virtual object where the occluding real object sits, when the location of the real object relative to the virtual scene is known, the occlusion of a real object by a virtual one presents a much more complicated problem because it requires the blocking of light in the real scene.The state-of-the-art OST-HMDs typically rely upon a beam splitter (BS) to uniformly blend the light from the real scene with the virtual objects, and lack the ability to selectively block out the light of the real world from reaching the eye.As a result, the digitally rendered virtual objects viewed through OST-HMDs typically appear "ghost-like," always floating "in front of" the real world.Figure 1 shows an un-edited AR view captured by a camera through a typical OST-HMD lacking occlusion capability where the virtual airplane appears not only washed out and non-opaque but also low-contrast.Creating a mutual occlusion-capable optical see-through HMD (OCOST-HMD) poses a complex challenge.In the last decade, few OCOST-HMD concepts have been proposed, with even fewer designs being prototyped [4][5][6][7][8].The existing methods for implementing OCOST-HMDs fall into two types: direct ray blocking and per-pixel modulation.The direct ray blocking method selectively blocks the rays from the see-through scene without focusing them.It can be implemented by selectively modifying the reflective properties of physical objects or by passing the light from the real scene through a single or multiple layers of spatial light modulators (SLM) placed directly near the eye.For instance, Hua et al. investigated the idea of creating natural occlusion of virtual objects by physical ones via a head-mounted projection display (HMPD) device, which involved the use of retroreflective screens onto non-occlusion physical objects and thus can only be used in limited setups [4].Tatham demonstrated the occlusion function through a transmissive SLM directly placed near the eye with no imaging optics [5].The direct ray blocking method via an SLM would be a straightforward and adequate solution if the eye were a pinhole aperture allowing a single ray from each real-world point to reach the retina.Instead, the eye has an area aperture, which makes it practically impossible to block all the rays seen by the eye from an object without blocking the rays from other surrounding objects using a single-layer SLM.Recently, Maimone and Fuchs proposed a lensless computational multi-layer OST-HMD design which consists of a pair of stacked transmissive SLMs, a thin and transparent backlight, and a highspeed optical shutter [6].Multiple occlusion patterns can be generated using a multi-layer computational light field method [7] so that the occlusion light field of the see-through view can be rendered properly.Although the multi-layer light field rendering method can in theory overcome some of the limitations of a single-layer ray blocking method, it is subject to several major limitations such as the significantly degraded see-through view, limited accuracy of the occlusion mask, and the low light efficiency.The unfavorable results can be attributed to the lack of imaging optics, low light efficiency of the SLMs, and most importantly the severe diffraction artifacts caused by the fine pixels of the SLMs located at a close distance to the eye pupil.
The per-pixel occlusion method, as illustrated in Fig. 2, is to form a focused image of the see-through view at a modulation plane where an SLM is inserted and renders occlusion masks to selectively block the real-world scene point by point.Based on this principle, the ELMO series of prototypes designed by Kiyokawa et.al. in the early 2000's perhaps are still the most complete demonstration of OST-HMDs with occlusion capabilities [8,9], all of which were implemented using conventional lenses, prisms and mirrors.The ELMO-4 prototype contains 4 lenses, 2 prisms and 3 optical mirrors arranged in a ring structure that presents a very bulky package blocking most of the user's face.Limited by the microdisplay and SLM technologies at that time, the ELMO prototypes have fairly low resolutions for both the see-through and virtual display paths, both of which used a 1.5-inch QVGA (320x240) transmissive LCD module [8,9].Using a transmissive LCD as a SLM becomes problematic because when coupled with a polarizing beamspliter (PBS), it allows for minimal light (<20%) from the real scene to pass through to the user, causing the device to become ineffective in dim environments.Cakmakci et al attempted to improve the compactness of the overall system by utilizing polarization-based optics and a reflective SLM [10].They used a reflective liquid crystal on silicon (LCoS) in conjunction with an organic light emitting device (OLED) display to give an extended contrast ratio of 1:200.An x-cube prism was proposed for the coupling of the two optical paths to achieve a more compact form factor.However, the design failed to erect the see-through view correctly [10].Recently, Gao et al. proposed to use freeform optics, a two-layer folded optical architecture, along with a reflective SLM to create a compact high resolution, low distortion OCOST-HMD [11,12].With the utilization of a reflective LCoS device as the SLM, the system allowed for a high luminance throughput and high optical resolution for both virtual and see-through paths.The optical design and preliminary experiments demonstrated great potentials for a very compelling form factor and high optical performances, but the design was dependent on the use of expensive freeform lenses and, regrettably, was not prototyped.Although freeform lenses can make it possible to create compact, wide field-of-view (FOV) eyepiece designs needed for occlusion-capability, these lenses are often expensive and challenging to design and fabricate [13][14][15][16][17].
In this paper, based on the two-layer folding optics architecture by Gao et al. [11,12], we present the design and prototype of a high-resolution, affordable OCOST-HMD system using off-the-shelf optical components.Our prototype, capable of rendering per-pixel mutual occlusion, utilizes an OLED microdisplay for the virtual display path coupled with a reflective LCoS as the SLM for the see-through path to achieve an occlusion capable OST-HMD offering a 30 degree diagonal FOV and 1920x1080 pixel resolution, with an optical performance of greater than 20% modulation contrast over the full FOV.We experimentally demonstrate a monocular prototype achieving >100:1 dynamic range in well-lighted environments.We further experimentally compared the optical performance of an OST-HMD with and without occlusion capability.

System optical design
Figure 2 illustrates a schematic diagram of our proposed OSOST-HMD optical architect.The design uses two folding mirrors, a roof prism and a PBS to fold the optical paths into a twolayer design, where the occlusion and the virtual display modules share the same eyepiece, giving a compact form factor and enabling per-pixel occlusion capability.The light path for the virtual display is highlighted with blue arrows, while the light path for the real-world view is shown with red arrows.An objective lens collects the light from the physical environment and forms an intermediate image at its focal plane where an amplitude-based SLM is placed to render an occlusion mask for controlling the opaqueness of the real view.The modulated light is then folded by a PBS toward an eyepiece for viewing.The PBS acts as a combiner to merge the light paths of the modulated real view and virtual view together so that the same eyepiece module is shared for viewing the virtual display and the modulated real-world view.The focal planes of the eyepiece and objective are optically conjugate with each other, which makes it possible to individually control the opaqueness of each individual pixel of the virtual and real scenes for pixel-by-pixel occlusion manipulation.A right-angle roof prism is utilized to not only fold the optical path of the real view for compactness but also to ensure an erected see-through view which is another critical requirement for an OCOST-HMD system.The system may further integrate a depth sensor that obtains the depth map of a real-world scene in order to generate a scene-dependent occlusion mask in real time.After comparing several candidate microdisplay technologies, we chose a 0.7" Sony color OLED microdisplay for the virtual display path.The Sony OLED, having an effective area of 15.5mm and 8.72mm and a pixel size of 12μm, offers a native resolution of 1280x720 pixels and an aspect ratio of 16:9.Ideally we would need an SLM of the same dimension, aspect ratio and pixel resolution to achieve pixel-by-pixel occlusion capability within the entire FOV of the virtual display.Limited by the availability of an SLM of the same specifications, we selected a 0.7" LCoS as the SLM for the see-through path.The LCoS, recycled from a Canon projector, offers a native resolution of 1400x1050 pixels, a pixel pitch of 10.7μm, and an aspect ratio of 4:3.A reflective SLM provides a substantial advantage in light efficiency and contrast over a light transmitting SLM.Typically, the light efficiency of the see-through path can be as high as 45% with a reflective LCoS but about 10% or less with a transmissive SLM, while the blocking efficiency is about 0.009% for a reflective SLM and 0.02% for a transmissive SLM [11].Consequently, by using a reflective type SLM, twice the blocking efficiency can be achieved.In addition, diffraction artifacts resulted from the propagation of light through an aperture is negligible for an SLM with a high fill factor while it is substantially noticeable for a transmissive LCD which typically has a low fill factor.
Based on the choices of microdisplay and SLM, we aimed to achieve an OCOST-HMD prototype with a diagonal FOV of 30°, or 26.5° horizontally and 15° vertically, and an angular resolution of 1.24 arcmins per pixel, corresponding to a Nyquist frequency of 24.2 cycles/degree in the visual space.We also set the goal of achieving an exit pupil diameter (EPD) of 9-12mm, allowing eye rotation of about ± 25° within the eye socket without causing vignetting of the optical system, and an eye clearance distance of at least 18mm.In order to develop a high-performance prototype with substantially much less cost than that of freeform optics in [11,12], we chose to carry out the entire optical design using available stock lenses, which makes the task substantially more challenging due to very limited choices of lens shapes and glass types.These constraints need to be carefully considered during the optimization process when creating lens forms for the eyepiece and objective designs.Furthermore, an optimized design obtained via an optical design software needs to be carefully matched and replaced by catalog lenses, which typically is subject to an iterative process of optimization and replacement.The design was further complicated due to the choice of a reflective SLM which requires an image-space telecentricity for both the eyepiece and objective designs to achieve high contrast, light efficiency and image uniformity.The final challenge of the design is the requirement for a large back focal distance (BFD) to make enough space for combining the two optical paths via a PBS. Figure 3 shows the lens layout of the final OCOST-HMD design.The light path for the virtual display (eyepiece) is denoted by the blue rays, while the light path for the see-through view is shown in red rays.It should be noted that the red rays for the see-through view overlap with the blue rays of the eyepiece after the PBS and thus only the blue rays are traced to the eye pupil in Fig. 3.The final design consists of 11 glass lenses (2 flint and 9 crown glass), 2 folding mirrors, 1 PBS, and 1 roof prism, all of which are stock components except for the meniscus which is made of flint glass with an aperture diameter greater than 40mm.Chromatic aberrations were optimized for 465, 550, and 615nm with weights of 1, 2, and 1, respectively, according to the dominant wavelengths of the microdisplay.The objective was optimized to have the chief ray deviated less than ± 0.5° from a perfect telecentric system while ± 1° deviation was allowed for the eyepiece.After properly cropping the eyepiece lenses, we were able to achieve an eye clearance of 18mm and a 10mm EPD.
The optical performance of the virtual display and see-through paths were assessed over the full field of view in the visual space where the spatial frequencies are characterized by the angular size in terms of cycles per degree.Figure 4 shows the polychromatic modulation transfer function (MTF) curves, evaluated with a 3-mm eye pupil, for several weighted fields of both the virtual display and the see-through paths.The virtual display path preserves roughly 40% modulation at the designed Nyquist frequency of 24.2 cycles/degree, corresponding to the 12μm pixel size of the OLED display.It can even maintain about 20% modulation at the frequency of 36 cycles/degree for the potential to update to an OLED of 8μm pixel size and 1920x1080 pixels.The performance of the see-through path has dropped slightly to an average modulation of 35% for the frequency of 25 cycles/degree and maintains about 30% modulation at the frequency of 30 cycles/degree for >90% of the entire seethrough field except that the MTF of the very far edge field drops to about 15%.Such optical performance is comparable to or even better than many custom HMD optics of similar resolution.Along with the MTF, the wavefront error plot and spot diagram for the see-through and virtual display paths were used to characterize the performance of the optical design.For the virtual display path, the dominating aberrations are coma and lateral chromatic aberration.While lateral chromatic aberration can be digitally corrected, much like distortion correction, by pre-warping the image for the red and blue color channels individually based on their laterial displacements from the reference green color channel, coma is exceptionally hard to correct.This is due to the non-pupil forming, telecentric design of the eyepiece and the inability to move the stop position to balance off-axis aberrations.Overall, the wavefront aberration in the eyepiece is sufficiently low, being under 1 wave.The average root mean square (RMS) spot diameter across the field is 15μm.Although it appears to be larger than the 12μm pixel size, this difference is largely due to lateral chromatic aberration, which as stated earlier, can be corrected.The dominating aberration in the objective lens design is axial chromatic aberration, which is typically corrected by using different glass types to balance the optical dispersion.Unfortunately, due to the limited flint glass selection of off-the-shelf lenses, this aberration is unavoidable.Nevertheless, the maximum wavefront aberration in the real image is still below 2 waves at the far field, and the average RMS spot diameter across the field is about 19μm.Compared to the 10.7 μm pixel pitch of the LCoS being used in the system, a 19μm RMS spot diameter in the objective design indicates that the actual occlusion mask resolution is limited by the objective lens resolution and is lower than the pixel resolution of the SLM.

System prototype and experimental demonstration
Figure 5(a) shows the sectional view of the mechanical housing with the light path of the real scene superimposed.For the mechanical design, lens cell stacks were used and inserted into a larger housing, where they were held by set screws to achieve more compensation in the optical design and meet the maximum MTF. Figure 5(b) shows the monocular prototype of the OCOST-HMD system built upon the optical design in Fig. 3.The prototyped system was measured as 82mm in height, 70mm in width, 50mm in depth.The vertical and horizontal FOV was determined for both the virtual and real paths by viewing a ruler through the optical system.It was determined that the see-though FOV was 27.69° horizontally and 18.64° vertically with an occlusion capable see-through FOV 22.62° horizontally and 17.04° vertically, while the virtual display had an FOV of 26.75° horizontally and 15.19° vertically, giving a measured diagonal Full FOV of 30.58 °.Due to the slightly mismatched aspect ratio between the OLED and LCoS, we anticipated that the LCoS would not be able to occlude the real scene in the same FOV of the virtual display in the horizontal direction.
For the purpose of qualitative demonstration of the occlusion capability of the OCOST-HMD prototype, we created a real-world scene composed of a mixture of laboratory objects with a well-illuminated white background wall (~300-500 cd/m 2 ) while the virtual 3D scene was a simple image of a teapot.Figures 6(a) through 6(f) show a set of images captured with a digital camera placed at the exit pupil of the eyepiece.The camera lens has a focal length of 16mm with its aperture set at about 3mm to match the F/# setting equivalent to that of human eyes under typical lighting conditions.Figure 6(a) is the view of the natural background scene only captured through the occlusion module when the SLM is turned on for light pass-through without a modulation mask applied and with the OLED microdisplay turned off.Several different spatial frequencies and object depths were portrayed in the background scene to display image quality and depth cues.Figure 6(b) is the view of the virtual scene captured through the eyepiece module when the real-word view was completely blocked by the SLM.Fig. 6.Experimental demonstration of mutual occlusion capability in our OCOST-HMD prototype with photographs captured with a digital camera placed at the exit pupil of the system: (a) view of a natural background scene through the occlusion model for light passthrough with the SLM turned on; (b) view of the virtual scene through the eyepiece with the see-through path being blocked by the SLM; (c) augmented view of the natural and virtual scenes without occlusion capability enabled; (d) View of the natural scene with an occlusion mask rendered on SLM; (e) augmented view with occlusion capability enabled where the virtual teapot is inserted in front of the background scene; (f) augmented view with occlusion capability enabled where the virtual teapot is inserted between two real objects for mutual occlusion demonstration.
Figure 6(c) shows the augmented view of the real-world and virtual scenes without the occlusion capability enabled (i.e., no modulation mask was applied to the SLM) by simply turning on the OLED microdisplay.Due to the bright environment, the teapot looks washed out without a mask occluding the see-though path.Not only does the teapot appear unrealistic and ghost-like, but it is also spatially unclear where the teapot sits in the image.Clearly, the virtual and real objects are mixed in very low contrast, which is the expected effect obtained through a typical OST-HMD without occlusion capability.Figure 6(d) shows the view of the real-world scene when the occlusion mask was displayed on the SLM and no virtual content shown on the OLED display.Apparently, the mask could effectively block the portion of the see-through view.Figure 6(e) is a view captured with the mask on the SLM and the virtual scene displayed on the OLED display.The result clearly demonstrates improved contrast and quality for the virtual view.We can observe that a realistic virtual image with obvious depth cues is now present.When virtual objects occlude the real scene, viewers can seamlessly transfer from AR to VR environments.To demonstrate the full capability and correct depth perception the occlusion display can render, Fig. 6(f) shows the view captured with the seethrough path, where the virtual teapot is inserted between two real objects, demonstrating the mutual occlusion capability of the system.In this case, knowing the relative location of the can which is meant to occlude part of the teapot, we removed the pixels that correspond to the projection of the occluding can on the virtual display from the teapot rendering.The significance of the result is that correct occlusion relationships can be created and used to give an unparalleled sense of depth to a virtual image in an OST-HMD.With a dynamic range of the virtual scene in bright environments, our OCOST-HMD system using stock lenses achieved a high optical performance, one that has significantly increased over that of non-occlusion-capable HMD designs.

Optical performance test
To further quantify the optical performance of the prototype system, we started with characterizing the MTF performance of virtual and real light paths through the prototype.A high-performance camera, consisting of a nearly diffraction-limited 16mm camera lens by Edmund Optic and a 1/3" Point Grey image sensor of a 3.75 μm pixel pitch was placed at the exit pupil of the system.It offers an angular resolution of about 0.8 arcminutes per pixel, significantly higher than the anticipated performance of the prototype.Therefore, it is assumed that no loss of performance to the MTF was caused by the camera.The camera then captures images of a slanted edge target, which is either displayed by the microdisplay or a printed target placed in the see-through view.To provide a separable quantification of the performance for the virtual and see-through path, the virtual image of a slanted edge was taken while the see-through scene was completely blocked by the SLM.Similarly, the seethrough image of the target was taken with the microdisplay turned off.The captured slantededge images were analyzed using Imatest software to obtain the MTF of the corresponding light paths.Fig. 7. Measured MTF performance of the OCOST-HMD prototype for the on-axis field of the virtual display, see-through view as well as the camera used for measurement.
Figure 7 shows that the measured on-axis MTF performance of both virtual and real paths, along with the MTF of the camera itself without the system for comparison, which match closely with the nominal performance shown in Fig. 4. Due to the magnification difference between the pixel pitch of the camera sensor and the microdisplay and SLM, the horizontal axis of the MTF measurement by Imatest was scaled by the pixel magnification difference between the camera and display and then converted to define the spatial frequency in the visual space in terms of cycles/degree by computing the angular size of a spatial feature, making it directly comparable with the plots in Fig. 4. The prototyped design was able to achieve a contrast greater than 40% at the Nyquist frequency 24.2 cycles/degree of the virtual display and similar performance for the see-through path.We then directly measured the spatial and angular resolutions of the see-though path using a printed US1951 resolution target.The target was set at 60cm away from the exit pupil and the same camera was used to capture a see-through image of the target to directly determine the smallest resolvable group.A contrast ratio above 0.1 was determined to be resolvable.The resolvable spatial frequency was determined to be at the Group 2 Element 5 for both horizontal and vertical lines, corresponding to 6.35 cycles/mm.At a distance of 60cm, this element gives an angular resolution of 66.49 cycles/degree, indicating that the resolvability of see-through path through the occlusion module is nearly intact to a human viewer.
We further measured the image contrast between the virtual display and the real-world scene as a function of the real-world scene brightness for different spatial frequencies.A grayscale solid image, ranging from black to white in 10 linear steps, was displayed on an LCD monitor to create a controlled background scene with varying luminance from 0 to 350cd/m 2 .The monitor was placed roughly 10cm in front of the OCOST-HMD system to simulate an array of real scene brightness.A sinusoidal grating pattern with a spatial frequency ranging from 0.7 to 24.2 cycles/degree was displayed on the OLED microdisplay (virtual path) to evaluate the effect of scene brightness on the image contrast of the virtual scene at different spatial frequencies.The fall-off in contrast to the virtual scene was then plotted and compared with occlusion enabled (SLM blocking see-through light) and without occlusion (SLM passing see-through light).Figures 8(a) and 8(b) show the captured images of a 12 cycles/degree spatially varying virtual image superimposed on a background image of full brightness with and without occlusion, respectively.Without occlusion, the virtual target was nearly washed out completely with a background as bright as 350 cd/m 2 .Figures 9(a) and 9(b) plotted the contrast of the virtual object contrast with the see-through path un-occluded and occluded, respectively.We can observe that the contrast of the virtual object without occlusion is quickly deteriorated to zero for a well-lit environment luminance above 200 cd/m 2 , while the contrast of the virtual target with occlusion of the real scene is nearly constant over an increased brightness.We further measured the obtainable contrast ratio of the occlusion system is greater than 100:1.The contrast ratio of the occlusion capable display was obtained by measuring a collimated depolarized light source through the system with full occlusion being enabled and disabled.

Conclusion
This paper presents a novel design and implementation of an occlusion-capable optical seethrough head-mounted display system using off-the-shelf optical components.A comprehensive description of the design and the monocular prototype was included, and the performance of the prototype was analyzed and evaluated.The system offered a 30° diagonal FOV and an angular resolution of 1.24 arcmins, with an optical performance of > 0.4 contrast over the full FOV at the Nyquist frequency of the display.By using the combination of a reflective type SLM and OLED display, we demonstrated a contrast ratio greater than 100:1 for the occlusion module.We also demonstrated that our prototype could be used in bright environments without loss of contrast to the virtual image.This study demonstrates that an OCOST-HMD system can achieve a high optical performance and a compact form factor for bright environments while using off-the-shelf components.

Disclaimer
Dr. Hong Hua has a disclosed financial interest in Magic Leap Inc.The terms of this arrangement have been properly disclosed to The University of Arizona and reviewed by the Institutional Review Committee in accordance with its conflict of interest policies.

Fig. 1 .
Fig. 1.Superimposing a virtual airplane in a well-lit real world environment: AR view captured through a typical OST-HMD without occlusion capability.

Fig. 2 .
Fig. 2. Schematic diagram of the proposed OCOST-HMD design based on two-layer folded architecture.

Fig. 8 .
Fig. 8. Sample images of a grating target of 12 cycles/degree displayed by the virtual display superimposed onto a bright background of 350cd/m 2 (a) with occlusion enabled to block the see-through light and (b) without occlusion.

Fig. 9 .
Fig. 9. Image contrast degradation of the virtual target of different spatial frequencies as a function of background scene brightness for (a) occlusion-disabled; and (b) occlusion-enabled displays.