Performance evaluation of dual-layer architectures for high dynamic range head mounted displays

A high dynamic range head mounted display (HDR-HMD) using a dual-layer per-pixel modulation method was recently demonstrated where both the display and modulation layers spatially overlap with an equal pixel resolution and are well-aligned to each other such that per-pixel dynamic range modulation becomes feasible. Besides the per-pixel modulation method, two other modulation methods can also be implemented via a dual-layer construction: the extended layer separation method where the display and modulation layers are largely separated in space, and the coarse backlight method where the display and modulation layers have largely different pixel resolutions such that the modulation layer may be treated as a locally-controllable backlight to the display layer. In this paper, we develop a generalize model to simulate the image formation process of dual-layer HDR displays and to evaluate the image performance of these different configurations and modulation methods. Maximum displayable spatial frequencies under different configurations are characterized. Experimental results using resolution targets support the model.


Introduction
One of the key challenges of the state-of-the-art head mounted displays (HMDs) is the incapability of rendering images with comparable dynamic range to real-world scenes.To offer high dynamic range (HDR) function, a display device should meet three requirements [1,2].First of all, an HDR display should have a high luminance light engine to offer a comfortable peak luminance level under different lighting environments.Secondly, it should produce a black luminance level as low as possible to guarantee a high 'on/off contrast,' which is defined as the ratio between the brightest and the darkest luminance level that the display can provide.Finally, the display should provide adequate number of command levels (CL) between the brightest and darkest luminance values to offer a sufficient luminance variation, which is known as the bit depth of a display.Among these three requirements, the last one is the most important and the hardest to achieve because the first two requirements can be met by using a high power light engine and a selfemissive type display which can obtain a true black and an ideal contrast ratio of infinity.Typical display devices including HMDs are only capable of rendering images with 8 bit-depth, or equivalently 256 command levels, CONTACT Hong Hua hhua@optics.arizona.edu3D Visualization and Imaging Systems Laboratory, Wyant College of Optical Sciences, University of Arizona, 1630 East University Boulevard, Tucson, AZ 85721, USA ISSN (print): 1598-0316; ISSN (online): 2158-1606 which is far below the dynamic range of the real world that can be as large as 14 orders of magnitude.Displays with such a low dynamic range (LDR) are incapable of rendering images with large contrast variations, which leads to the loss of details with small grayscale variation and the sense of immersion.For optical see-through augmented reality (AR) applications, coupling a LDR virtual image with an HDR scene makes the merged image lose its fidelity and fails to produce a realistic mixed scene.Therefore, developing HDR display hardware with high image performances becomes critical.
The most straightforward way to design an HDR-HMD is to increase the amplitude resolution of its drive circuitry [1].However, high bit-depths driver circuitry not only increases the cost but also is challenging to manufacture.Another method is to stack two or more spatial light modulators (SLM) to modulate the pixel values simultaneously.Typically, a dual-layer HDR display consists of a modulation layer, a display layer and a reconstruction image plane [3,4], as shown in Figure 1a.The modulation layer offers a spatially varying light source to enhance the dynamic range of the display layer.The modulation layer may consist of a uniform backlight and a spatial light modulator (SLM), as shown in Figure 1a-1.
Alternatively, it can be a self-emissive addressable source, such as a light emitting diode (LED) array or an organic light emitting device (OLED) or micro-LEDs panel, as shown in Figure 1a-2.While the modulation layer provides the spatially varying illumination, the display layer in the front renders the details of an image.By multiplying the modulations of both layers, an HDR image with a large contrast ratio and high resolution is achieved at the reconstruction image plane.Typically, the reconstruction image plane coincides with the display layer.If the two layers have discrete command levels CL 1 and CL 2 respectively, the resulting dynamic range of the HDR image is maximized as Based on the resolutions of the two layers and the layer separation, the modulation method of a multi-layer HDR display can be classified into three types: per-pixel modulation, extended layer separation, and coarse backlight modulation, as shown in Figure 1b -d, respectively.In a per-pixel modulation scheme, the modulation layer and display layer have the same pixel size and resolution and they are separated with a negligible distance [5].The image is modulated pixel by pixel and there is no image misalignment caused by motion parallax at different viewing positions.It is therefore the ideal configuration for dynamic range enhancement.Achieving per-pixel alignment in an HMD system, however, is especially challenging because an inevitable small tip-tilt or displacement between the layers is magnified tens of times by its eyepiece and thus the misalignment effects become highly visible and results in a large layer separation in visual space, as shown in Figure 1e.Moreover, stacking two layers with negligible separation is hardly achievable due to the physical thickness of SLMs.Xu et al. designed a HDR display engine with an image relay to minimize the layer separation [3,4].Although a projection optics or a relay system can image the content on the modulation layer to the display layer to optically reduce layer separation, introducing additional optics increases the volume of the display engine and makes the display engine non-portable.
In the extended layer separation method shown in Figure 1c, the modulation and display layers are intentionally separated with a considerably large distance.Due to the large separation, a pixel on the modulation layer will modulate simultaneously a group of pixels on the display layer.To adapt with the extended layer separation, the image contents on both layers are computationally optimized such that the display layer shows details of high spatial frequencies and the modulation layer renders contents of low spatial frequencies mainly for enhancing the dynamic range [6].This configuration also has the capability of rendering lightfield effects [6][7][8].Since only the display layer shows the image details, the image degradation is less sensitive to pupil positions, as shown in Figure 1f.However, a large layer separation reduces the modulation capability of the modulation layer.Once the modulation layer is located out of the viewer's depth of field, each pixel on the modulation layer becomes blurry, leading to a degradation in dynamic range enhancement and reconstruction accuracy.
In the coarse backlight modulation method shown in Figure 1d, the two layers with small physical separation have significantly different spatial resolutions.Typically, an addressable source with a low resolution such as an LED array or a low-resolution SLM panel with a uniform backlight is utilized as the modulation layer [9].It is able to enhance contrast ratio by providing local luminance modulation in a compact form.With a significantly lower resolution on the modulation layer, a pixel on the modulation layer will modulate simultaneously a group of pixels on the display layer, yielding effects of modulation overlap and inaccuracy.Compared to the extended layer separation method, however, this approach may obtain better pupil stability due to the benefits of small layer separation as illustrated in Figure 1g.
To further investigate the performance and characteristics of the aforementioned three configurations, in this paper we develop a framework to simulate the image formation process of an HDR display from a display source to the retinal image (Section 2).The simulator is capable of modeling different HDR methods.Three different image performance metrics, including image sharpness (or spatial resolution), HDR modulation accuracy and efficiency, and eye pupil position sensitivity (i.e.image stability within the eyebox), are utilized to analyze the simulated retinal image and evaluate the performance of different modulation methods under different layer separations and modulation layer resolutions (Section 3).The tradeoffs between the layer separation or modulation resolution versus the reconstruction accuracy are characterized.Finally, utilizing an HDR-HMD prototype we experimentally demonstrate and compare the performance of the three different HDR display methods (Section 4).

Retinal image simulation of HDR head-mounted displays
Figure 2 shows the schematic layout of a simplified HDR-HMD image simulation model which consist of an HDR display engine, a reconstructed image plane, an exit pupil plane, and an eye model.The HDR display is composed of a display layer and a modulation layer as seen in the visual space of the eye.Different from direct view displays, the modulation and display layers in an HMD are the magnified virtual images of their counterparts through the HMD imaging optics, such as an eyepiece, and are observed by a viewer's eye located within a confined area, namely the exit pupil or the eyebox.Without loss of generality, the properties of the magnified virtual images, rather than the physical devices, are modeled as they appear in a viewer's visual space.For instance, the SLM1 and SLM2 in Figure 2 represent the virtual images of the display and modulation layers, respectively, and their dioptric distance is denoted as z.The layer resolution is characterized by the angular size, in terms of arcminutes, subtended by a pixel through the imaging optics of the HMD and is denoted as S1 and S2 for the SLM1 and SLM2, respectively.A smaller angular size per pixel indicates higher resolution.The resolution ratio, S 1 /S 2 , is utilized to characterize the angular resolution difference between the two layers.The configuration of z ≈ 0 and S 1 /S 2 ≈ 1 leads to the per-pixel modulation method, a considerably large z leads to the extended layer separation modulation method, and the configuration of z ≈ 0 and S 1 /S 2 1 yields a coarse modulation.We further assume the reconstruction image plane coincides with the depth of the display layer at which the eye is accommodated at to perceive in-focused image.
The perceived retinal image of an HDR-HMD may vary with the eye pupil location within the eyebox.A typical HMD eyebox is 10 mm in diameter or larger [10], while the eye pupil is 2-5 mm in bright light [11].The numerical aperture (NA) perceived by the eye pupil is smaller than that of the eyebox.Let us consider that the eye is accommodated on SLM1 (point P in Figure 2).When the layer separation between SLM1 and SLM2 is adequately small such that that the light cone incident upon the eye pupil from point P (the red ray bundle in Figure 2) only covers one pixel area P on the modulation layer, the integrated illuminance of the retina image is only modulated by the pixel values of P and P , which results in the per-pixel modulation method.When the layer separation is extended such that the light cone from P covers an extended area of multiple pixels on SLM2, the integrated retinal image illuminance is affected by all the pixels within the projected area on the modulation layer.The light cones from neighboring pixels on the display layer will inevitably overlap on the modulation layer, causing modulation crosstalk as well as image contrast degradation.Furthermore, multiple pixels on SLM2 contribute to a retinal image point and cause a reconstruction error, which we refer to as multi-pixel modulation hereafter.When the angular resolution of the display layer is substantially higher than that of the modulation layer (i.e. S 1 /S 2 1), even with a small layer separation, the light cones from multiple pixels on the display layer may be modulated by the same pixel on the modulation layer due to its reduced resolution, resulting in multi-pixel modulation effects and crosstalk similar to that of extended layer separation.
To display an HDR image (I 0 ) with optimal image performance, the image contents on SLM1 and SLM2 are computationally optimized according to the layer separation and resolution ratio.In the per-pixel modulation case, since each pixel on the SLM1 layer is uniquely modulated by a single corresponding pixel on the SLM2, the total pixel value (I 0 ) can be equally distributed on the SLM layers as √ I 0 , and the resulted HDR image can be easily reconstructed by √ I 0 * √ I 0 = I 0 .However, in the case of multi-pixel modulation due to either extended layer separation or coarse resolution of the modulation layer, we need to account for the effects that multiple pixels on the SLM2 may modulate a single pixel on the display layer or that a single pixel on the SLM2 may modulate a group of pixels on the display layer.In the case of multi-pixel modulation due to extended layer separation, the modulation image on the SLM2 is rendered as.
The effective SLM2 modulation projected on the SLM1 layer can be modeled as where * denotes convolution and PSFe represents the projected modulation distribution on the SLM1 from a pixel P' on the SLM2.PSFe can be described as where z is the layer separation, t ap is the aperture transmittance function defined by the eyebox, J 0 is the zeroth order Bessel function of the first kind, NA is the numerical aperture of the HMD imaging optics defined by the eyebox, α is half of the emitting cone angle corresponding to NA, r e is the radial distance from the ray bundle center, λ is the wavelength and ρ is a normalized integral variable.The Equation ( 2) incorporates all directional ray intensity contributions from all the pixels on SLM2 that modulate the pixel value of Point P. To compensate the multi-pixel modulation effects by SLM2, the pixel value of the SLM1 is computed as In the case of multi-pixel modulation due to coarse resolution on the modulation layer, a single pixel on the SLM2 provides modulation to multiple pixels on the SLM1 simultaneously.Let us assume the input image has the same resolution as SLM1 and the modulation downsampling ratio of the two layers is S 1 : S 2 .The pixel value of the SLM2 needs to be resampled as: where PSF c (r c , σ ) is a Gaussian-like down-sampling operator, the width r c is determined by the resolution ratio S 1 : S 2 and the distribution factor σ is determined by the distribution type of SLM2 pixels.The pixel value of the SLM1 can then be computed by applying Equations ( 2) -( 4) to compensate the effects of layer separation.The rendering algorithm above helps to compensate the artifacts caused by layer separation or coarse resolution modulation and reallocates the high and low spatial frequencies onto the SLM1 and SLM2, respectively, since the convolution operator is equivalent to applying a low pass filter on the SLM2 image content in the frequency domain.Ideally, the total pixel value of the reconstructed image through multi-pixel modulation remains to be I 0 and there is no reconstruction error after applying this rendering algorithm.However, when I (SLM 1 ) exceeds the displayable pixel value range of SLM1, the actual displayed pixel value is compromised and image reconstruction error occurs, which is defined as.
where I r is the pixel value of the reconstructed retinal image.Figure 3 demonstrates an example of a simulated retinal image and reconstruction error distribution under different layer separations.The target HDR image shown in Figure 3a is displayed through the HDR display engine.Both SLMs have a resolution 1280 by 960 pixels and angular resolution of 0.5 arcminutes in visual space or equivalent pixel size of 0.073 mm at a 0.5 meter viewing distance.The eye pupil has a diameter of 3 mm and is located at the center of a 10 mm eyebox.The SLM1 is located 2 diopters away from the exit pupil plane, while we varied the SLM2 location to examine the effects of layer separation.Selecting a distance of 2 diopters for the SLM1 is based on the considerations of providing an average viewing distance for arm-length augmented reality applications and offering an adequate range of layer separation for investigation.The maximum luminance on SLM2 is 220 cd/m 2 , and both layers are 8 bit-depth.Figure 3b shows the simulated retinal images under the layer separation of 1.0D, 1.5D and 2.0D in visual space.The luminance reconstruction error map I err is plotted in Figure 3c.To evaluate whether the error can be perceived by human eyes, Figure 3d plots the binary just noticeable difference (JND) map based on the Barten's contrast sensitivity function (CSF) and DICOM standard [12], where white pixels denote the difference that the human eyes can distinguish.The areas with noticeable difference are 3.04%, 4.60% and 6.01% of the whole image corresponding to the three different layer separations in Figure 3b, respectively, which shows increased error percentage as layers separate.A similar trend can be observed in the coarse modulation method, where the reconstructed error becomes significant as the down-sampling ratio increases.It is also clearly seen that noticeable errors mainly occur at the area with high spatial frequency (e.g. the tree) or high luminance (e.g. the sky).
We further analyzed the reconstruction error as the layer separation gradually increases from 0D to 2D. Figure 4a plotted the projected modulation area of the PSF e (r e , z) (also known as the multi-pixel modulation area A PSF ), the average difference or the reconstruction error I err , and the percentage of area with noticeable luminance difference P notice as functions of layer separation.It is clearly seen that PSF e (r e , z) increases as the layer separation increases.The increased size of PSF e (r e , z) causes multi-pixel modulation, leads to the linear increase of I err and P notice and results in the degradation of image performance.Therefore, the area of multi-pixel modulation A PSF acts as the key parameter affecting reconstruction errors, and it can be treated as a parameter that directly indicates the reconstructed image performance of the HDR engine.It is worth noting that A PSF is not only affected by the configuration of the HDR engine, but also by the accommodated eye pupil diameter.Figure 4b plots A PSF as a function of eye pupil diameter and layer separation, which gives a sense of HDR image reconstruction errors under different eye pupil diameters and layer separations.For example, a HDR engine seen by a 5 mm eye pupil diameter with 0.5D layer separation should have a similar image performance as the HDR engine seen by an 8 mm eye pupil diameter with 0.43D layer separation, since they have the same A PSF value.

Characterizing MTF performance of dual-layer HDR displays
Based on the simulation model discussed in Section 2, in this section, we analyze the reconstruction error in the frequency domain as a function of layer separation z or resolution ratio S 1 : S 2 .Based on the simulation results, the maximum displayable frequencies under different HDR engine embodiments are also studied.
To study the reconstruction error distribution at a given image spatial frequency under different HDR engine embodiments, we simulated the reconstruction error of a sinusoid pattern with the layer separation varied from 0D to 2D at an increment of 0.05D.The display numerical aperture is 0.176 and the angular frequency of the sinusoid pattern is 6 cycles/degree.Figure 5a and b shows the reconstruction image profile I r and reconstruction error profile I err in one period as the layer separation varies from 0D to 2D.The reconstructed image contrast degrades significantly as the layer separation increases.To demonstrate where the reconstruction error comes from, Figure 5c shows the SLM2 modulation error distribution caused by PSF i multi-pixel modulation in one period under different layer separations.The red solid line denotes the threshold of reconstruction error, above which the SLM2 modulation error can be fully compensated by SLM1.The reconstruction error below the threshold cannot be compensated in the SLM1 due to its limited displayable range.It can be seen that the reconstruction error is mainly located at the bright areas and increases significantly as the layer separation increases from 0D to 0.5D and tends to be stable as the separation further increases.Figure 5d -g shows an example of the image degradation with 10 periods when the layer separation equals to 1D.
Based on the above analysis, we can conclude that the reconstruction error distribution of I err is determined by three factors: (1) Original image value I 0 , where the bright region is beyond the dynamic range capability of an HDR engine configuration; (2) Layer separation, which affects the area of multi-pixel modulation A PSF ; and (3) The image value gradient or variation within the PSF-covered region, which can also be characterized as the image spatial frequency distribution in a local region, since all pixel values in the region contributes the modulation of I r .By varying the angular frequency of the target image, we can simulate the modulation degradation of the reconstructed image as layer separation increases.Figure 6a plots the reconstruction error as a function of layer separation under four different angular frequencies of 12, 15, 20, and 30 cycles/degree, and Figure 6b plots the reconstruction error as a function of target spatial frequencies for four different layer separations of 0.2, 0.5, 1, and 1.5 diopters.It can be clearly observed that the ability of an HDR engine to accurately reconstruct high-spatial frequency details degrades rapidly as the layer separation increases.Therefore, we define the boundary frequency, which is the highest image angular frequency that can be reconstructed with a given threshold of reconstruction error, as a metric to quantify HDR image reconstruction ability under different angular frequencies and different layer separations.Figure 6c plots the boundary frequency as a function of layer separation with a threshold value of 10%, 20% and 30% reconstruction error,  respectively.It is clearly seen that the boundary frequency drops rapidly as the layer separation increased from 0D to 0.43D then drops slowly as the layer separation increase beyond 0.5D.
Similarly, the reconstructed luminance error was simulated for the coarse backlight HDR display under different down-sampling ratios.Figure 7a plots the reconstruction error under different backlight downsampling ratios with four angular frequency image patterns, and Figure 7b plots the reconstruction error as a function of spatial frequency with a given downsampling ratio.Figure 7c gives the boundary frequencies as a function of down-sampling ratios when the acceptable reconstruction luminance error varies from 10% to 30%.The slope of the boundary frequency degradation decreases after down-sampling ratio increase to 10:1.

Experimental results
To evaluate the image performance of HDR displays under different hardware embodiments, we implemented an HDR display prototype with a tunable layer separation and variable backlight down-sampling ratio.Figure 8a and b shows the schematic layout of the prototype and a photo of the experimental setup, respectively.The system contains two liquid crystal on silicon (LCoS) microdisplays, both of which have a pixel resolution of 1280 by 960 and pixel size of 6.35 microns.One LCoS (denoted as LCoS2) contains a built-in LED illumination unit to serve as the light engine.A double telecentric relay with an NA of 0.12 is used to image the modulation of LCoS2 onto the other LCoS (denoted as LCoS1), so that the light from the light engine is modulated by the modulation layer (LCoS2) and display layer (LCoS1) with a negligible layer separation.By varying the LCoS2 axial position, we are able to dynamically change the separation between the two layers.The coarse backlight can be implemented by displaying a down-sampled image on LCoS2 by applying a given down-sampling operator.The displayed image is coupled out by a polarized beam splitter, and a commercial eyepiece is utilized as the viewing optics to magnify the resulted HDR image.A grayscale camera with f/4 and 12 mm focal length focusing at the depth of 2 diopters away is placed at the exit pupil to capture the reconstructed image.The system has the diagonal field of view of 18.6°, with the angular resolution of 0.7 arc minutes for the virtual HDR image.

Simulation model validation of HDR displays
Before we evaluate the HDR display image performance, we want to validate the correctness of our simulation model in Section 2 and Section 3. The most straightforward way is to compare the simulated image with the camera-captured image.To evaluate the image performance under different image visibilities and spatial frequencies, a test image consisting of a series of resolution targets with different angular frequencies and visibilities is generated as shown in Figure 9a.The normalized luminance and visibility of each target group are summarized in Table 1.Five different levels of layer separations, 0.1D, 0.6D, 1.1D, 1.7D and 2D, were simulated and experimentally compared.For each layer separation, we adopted the simulation model in Section 2 and incorporated the diffraction effect of the display, relay optics and eye optics.Figure 9b plots the projected PSFs on the display layer LCoS1 from a point on the LCoS 2, PSF e (r e , z), expressed by Equation (3), and Figure 8c shows the simulated effective retinal image of the modulation layer (SLM2) expressed by Equation ( 2).Correspondingly, we captured the SLM2 modulation of our prototype by only displaying the target image on LCoS2 with the same layer separations, as shown in Figure 9d. Figure 9c and d show that the simulation and experimental results match well.It is also clearly seen that angular frequency of the SLM2 modulated retinal image decreases significantly as the layer separation increases.
Figure 10a shows the simulated the retinal images of the reconstructed HDR images under different layer separations.The simulation is based on the product of the modulations on the two SLMs, and the results shown are tone-mapped images to an 8-bit depth.As a comparison, Figure 10b shows the tone-mapped, cameracaptured HDR images under different layer separations.Each HDR image is synthesized by capturing four images   under different exposures and is then tone-mapped to 8-bit depth.During the experiment, image rendering follows the procedure described in Section 2 with additional gamma correction and distortion pre-warping.The simulation result is consistent with the experimental result, which proves the validity of the simulation model.Note that shadows are more visible at the sharp edges between the foreground bars and dark backgrounds, and become severer as the modulation layer moves further away, which further validates the analysis in Section 3 that image gradient and layer separation are key factors affecting reconstruction error distributions.

Performance evaluation of HDR displays with different layer separations
The system performance is evaluated by three factors: modulation effectiveness, image sharpness and pupil stability.The modulation effectiveness determines the image authenticity, which we use the root mean square error (RMSE) between the illuminance of the reconstructed image and original image as an evaluation factor.
To consider the human eye perceptions, the JND map and the noticeable luminance error percentage are also considered as evaluation factors to evaluate the image reconstruction differences and errors.MTF is used to quantify the reconstructed image sharpness, which is the capability of recovering image content in different spatial frequencies.The slanted edge method is used to test the system MTF in the experiment.Pupil stability is evaluated by the reconstruction error variation as the viewing position moves within the eyebox.By moving the camera position within the eyebox and calculating the RMSE and JND percentage changes of the reconstructed image, we can determine the pupil stability at different pupil positions.
The modulation effectiveness of HDR displays under different layer separations are evaluated by analyzing the distribution of the RMSE reconstruction error and the noticeable differences on the reconstructed image.10d show the reconstruction error map from experiments when the layer separation equals to 0.6D, 1.1D, 1.7D and 2.0D, respectively.Figure 11e shows the reconstruction error from simulation with a 2.0D layer separation.Compared with the simulation result of the same layer separation, the experimental error in Figure 11d shifts to one side of the target bar, which results from the misalignment as SLM2 moves in the axial direction.Thus, we calibrated the misalignment during the axial movement and Figure 11f shows the simulated reconstruction error after calibration.Comparing Figure 11d and f, the simulation result matches the experimental result.Figure 11a -f also verifies that the reconstruction error mainly emerges at the high gradient value area such as the edge of foreground bars with high visibility.The binary noticeable difference maps under different layer separations are shown in Figure 11g -i, which shows that the reconstruction error is more noticeable at high spatial frequency targets as the layer separation increases.In summary, Figure 11j and k   Pupil stability may be affected by three factors: SLM misalignment, vignetting and pupil aberration.In the experiment, the camera was moved in the exit pupil plane with a horizontal displacement of -3 mm, -2 mm, -1, 0, 1 and 2 mm away from the center of the pupil.Figure 13a -d shows the reconstruction error and the noticeable difference map when the viewing position moves at the leftmost and rightmost of the eyebox.As seen in the figure, vignetting is the dominant source of the reconstruction error, since the illuminance error map shows significant non-uniformity toward the direction of pupil movement.Figure 13f and Figure 12g plot the increase of the noticeable difference percentage and illuminance RMSE at different viewing positions, and their averages under different layer separations are summarized in Figure 13e.Compared with small layer separation, the reconstruction error under large separation is less sensitive to the viewing position, which gives the system more pupil stability and alignment tolerance.In conclusion, the HDR display with extended layer separation has the advantage in pupil stability, while it is at a cost of modulation effectiveness and image sharpness.It can be explained by the image frequency reallocation: when the layer separation is negligible, the image angular frequencies are equally distributed on both layers, per-pixel modulation is achieved, and we are able to make the best use of the dynamic range enhancement of both SLMs.However, the image quality will degrade significantly with misalignment error or mechanical turbulence.On the contrary, if most of the high angular frequency content is re-allocated to the display layer, and the low angular frequency content is allocated to the modulation layer, then the system alignment tolerances can be loosened dramatically.However, these constructions sacrifice the available dynamic range on the modulation layer and bring reconstruction errors, like those seen in Section 3.

Coarse backlight HDR displays
To model the performance of HDR displays with coarse backlight modulation, the image content displayed on LCoS2 is modified by down-sampling the square root of original image √ I 0 to fit the modulation resolution on SLM2, while keeping SLM2 axial position coaligned with the SLM1 (i.e.layer separation equals to 0D).In our experiment, a low pass Gaussian filter PSF c (r, σ ) with a width of the down-sample ratio S 1 : S 2 is applied to SLM2 content and the content on SLM1 is optimized to compensate for the down-sampling effect as the discussion in Section 2. Figure 14a -c shows the experimental result of the retinal images with the down-sampling ratios of 27:1, 55:1 and 79:1.Their corresponding illuminance reconstruction error maps are shown in Figure 14d -f.Blooming effect is increasingly visible at the edges of the resolution targets as the down-sampling ratio increases.Experiment and simulation results of the noticeable difference percentage and illuminance RMSE at different down-sampling ratios are plotted in Figure 14g and h, showing an increasing reconstruction error as downsampling ratio increase.modulation ratios.While misalignment causes significant illuminance errors at the high spatial frequency content boundaries (Figure 16(a)), vignetting is still the leading cause for large area of noticeable illuminance difference (Figure 16(b)).Figure 16c plots the average JND and RMSE changes as a function of multi-modulation ratio.Figure 16d and e plots the noticeable luminance error change and the RMSE changes as a function to the amount of pupil shift from the central view position for different multi-pixel modulation ratios, from which we can conclude that the pupil stability improved as multimodulation ratio increases at the cost of the modulation effectiveness and image sharpness.
In summary, the HDR displays with coarse backlight also get better pupil stability and alignment tolerance by compromising the modulation effectiveness and image sharpness.By applying a low-pass filter onto the modulation layer and modifying the content on the display layer, high spatial frequency image content is displayed on a single layer, which gives the system looser tolerances.

Conclusion
In this paper, we evaluate and compare the performances of three typical HDR-HMD configurations.The methods and models of simulating the HDR reconstructed image performance of the dual-layer based HDR-HMD are also presented.The image performance and tolerance of three different hardware embodiments are evaluated: the perpixel modulation, extended layer separation modulation and coarse backlight modulation.The modulation error sources are investigated, and the maximum displayable frequencies under different layer separations and contrast criteria are given.To experimentally investigate the image performance with different layer separations and backlight down-sampling ratios, we setup a prototype and developed a corresponding image rendering algorithm to analyze the reconstructed HDR image performance.The MTF, luminance error and pupil stability are evaluated under different hardware implementations, which gives a comprehensive HDR-HMDs performance assessment.

Figure 1 .
Figure 1.(a) Schematic layouts of the HDR image generator.The modulation layer is composed of (1) a backlight with SLM2 or (2) an addressable source panel.(b)-(d): 1-D diagram illustrating (b) pixel-by-pixel modulation (c) extended layer separation modulation and (d) coarse modulation method.(e)-(g): demonstration of pupil swim effects with (e) per-pixel modulation method, (f) extended layer separation method; and (g)coarse backlight modulation method.

Figure 2 .
Figure 2. Schematic layout of the HDR-HMD image simulation model of (a) the extended layer separation method and (b) the coarse backlight method.

Figure 3 .
Figure 3. Examples of simulated retinal image of an HDR-HMD engine under different layer separations of 1.0D, 1.5D and 2.0D.(a) The original HDR image after tone-mapping.(b) Reconstructed HDR image after tone-mapping.(c) The reconstructed error map and (d) binary noticeable difference map, with the white area denotes the pixel luminance reconstruction error that can be perceived by human eyes.

Figure 4 .
Figure 4. (a) The plots of the analysis results when physical separation varies from 0D to 2D.Blue: the multi-pixel modulation area A PSF (in pixel); Red: the average luminance error I err ; Green: the noticeable difference percentage P notice (%).(b) The plot of the modulation area A PSF change as a function of layer separation and eye pupil diameter.

Figure 5 .
Figure 5. (a)-(c) Simulated spatial frequency responses as layer separation varies from 0D to 2D: (a) the reconstructed image profile, (b) reconstruction error and (c) the modulation error distribution of SLM2 for a single period of a sinusoidal pattern; (d)-(g) HDR image simulation with a 10-period sinusoid pattern for the layer separation of 1D: (d) the targeted HDR image I 0 ; (e) the reconstructed HDR image I r ; (f) the image displayed on SLM2; and (g) the actual SLM2 modulation projected on SLM1.Under each of the simulated image in (d)-(g) the modulation profile in one period was plotted.

Figure 6 .
Figure 6.The relationship between the contrast drop, image angular frequency and layer separation.(a) The reconstruction error as a function of layer separation, with the sinusoid target image of 30, 20, 15, 12 cycles per degree; (b) the reconstruction error as a function of image angular frequency, with the given layer separations of 0.2D, 0.5D, 1.0D and 1.5D respectively; (c) the boundary frequency of the reconstructed image as a function of layer separation for the threshold value of contrast drop of 10%, 20% and 30%, respectively.

Figure 7 .
Figure 7.The relationship between the contrast drop, image angular frequency and down-sampling ratio of HDR display.(a) The contrast drop as a function of layer down sampling ratio, with the sinusoid target image of 30, 6, 3.3 and 2.3 cycles per degree; (b) the contrast drop as a function of image angular frequency, with the down-sampling ratio of 3, 11, 31 and 51-1 pixel along one side; (c) the boundary frequency of the reconstructed image under different layer down-sampling ratios, with the acceptable contrast drop of 10%, 20% and 30%, respectively.

Figure 8 .
Figure 8.(a) Schematic layout and (b) experimental prototype of a monocular HDR-HMD system based on dual-layer modulation scheme.

Figure 9 .
Figure 9. (a) Original targeted HDR image.(b) Simulated PSF at different layer separations.(c) Simulated effective modulations of the modulation layer (SLM2) and (d) experimental effective modulation of the modulation layer (SLM2).

Table 1 .
Modulation and visibility of the targeted image.

Figure 10 .
Figure 10.The perceived HDR retinal image by dual-layer modulation.(a) Simulation results and (b) experiment results under different layer separations.All images are tone-mapped for presenting on LDR devices.

Figure 11a and Figure
Figure11aand Figure10dshow the reconstruction error map from experiments when the layer separation equals to 0.6D, 1.1D, 1.7D and 2.0D, respectively.Figure11eshows the reconstruction error from simulation with a 2.0D layer separation.Compared with the simulation result of the same layer separation, the experimental error in Figure11dshifts to one side of the target bar, which results from the misalignment as SLM2 moves in the axial direction.Thus, we calibrated the misalignment during the axial movement and Figure11fshows the simulated reconstruction error after calibration.Comparing Figure11dand f, the simulation result matches the experimental result.Figure11a-f also verifies that the reconstruction error mainly emerges at the high gradient value area such as the edge of foreground bars with high visibility.The binary noticeable difference maps under different layer separations are shown in Figure11g -i, which shows that the reconstruction error is more noticeable at high spatial frequency targets as the layer separation increases.In summary, Figure11j and kplots the overall percentage of noticeable difference and image illuminance RMSE as a function of layer separation, both of which indicate the increased reconstruction errors as the layer separation extended.Figure 12a -c plots the MTF values measured from the retinal images of the reconstructed HDR image, the modulation layer image (SLM2), and display layer image (SLM1), respectively, for 6 different layer separations Figure11aand Figure10dshow the reconstruction error map from experiments when the layer separation equals to 0.6D, 1.1D, 1.7D and 2.0D, respectively.Figure11eshows the reconstruction error from simulation with a 2.0D layer separation.Compared with the simulation result of the same layer separation, the experimental error in Figure11dshifts to one side of the target bar, which results from the misalignment as SLM2 moves in the axial direction.Thus, we calibrated the misalignment during the axial movement and Figure11fshows the simulated reconstruction error after calibration.Comparing Figure11dand f, the simulation result matches the experimental result.Figure11a-f also verifies that the reconstruction error mainly emerges at the high gradient value area such as the edge of foreground bars with high visibility.The binary noticeable difference maps under different layer separations are shown in Figure11g -i, which shows that the reconstruction error is more noticeable at high spatial frequency targets as the layer separation increases.In summary, Figure11j and kplots the overall percentage of noticeable difference and image illuminance RMSE as a function of layer separation, both of which indicate the increased reconstruction errors as the layer separation extended.Figure 12a -c plots the MTF values measured from the retinal images of the reconstructed HDR image, the modulation layer image (SLM2), and display layer image (SLM1), respectively, for 6 different layer separations Figure11aand Figure10dshow the reconstruction error map from experiments when the layer separation equals to 0.6D, 1.1D, 1.7D and 2.0D, respectively.Figure11eshows the reconstruction error from simulation with a 2.0D layer separation.Compared with the simulation result of the same layer separation, the experimental error in Figure11dshifts to one side of the target bar, which results from the misalignment as SLM2 moves in the axial direction.Thus, we calibrated the misalignment during the axial movement and Figure11fshows the simulated reconstruction error after calibration.Comparing Figure11dand f, the simulation result matches the experimental result.Figure11a-f also verifies that the reconstruction error mainly emerges at the high gradient value area such as the edge of foreground bars with high visibility.The binary noticeable difference maps under different layer separations are shown in Figure11g -i, which shows that the reconstruction error is more noticeable at high spatial frequency targets as the layer separation increases.In summary, Figure11j and kplots the overall percentage of noticeable difference and image illuminance RMSE as a function of layer separation, both of which indicate the increased reconstruction errors as the layer separation extended.Figure 12a -c plots the MTF values measured from the retinal images of the reconstructed HDR image, the modulation layer image (SLM2), and display layer image (SLM1), respectively, for 6 different layer separations

Figure 11 .
Figure 11.(a)-(d): The luminance error for reconstructed HDR image when the modulation layer is located 0.6D, 1.1D, 1.7D and 2D away from the display layer.(e) Simulated luminance error map with a separation of 2D.(f) Simulated luminance error map after considering layer misalignment errors.(g)-(i) Experimental binary noticeable map with layer separation of 0.6D, 1.1D and 2D.(j) Simulated and experimental result of the noticeable luminance percentage and (k) simulated and experimental result of RMSE under different layer separations.

Figure 12 .
Figure 12.(a) The MTF plot of the reconstructed image (SLM1 + SLM2); (b) the MTF plot of the modulation layer on the reconstruction image plane (SLM2) and (c) the MTF plot of the display layer image content (SLM1).

Figure 13 .
Figure 13.The pupil sensitivity analysis of the reconstructed HDR image under different layer separations and viewing positions in the eyebox.(a) and (c): The luminance error plot at the pupil edge (−3 mm and +2 mm) with negligible layer separation; (b) and (d): corresponding noticeable luminance error map at the pupil edge, where white areas denote the luminance error can be perceived.(e) The average noticeable luminance error change and average RMSE change of all sampled pupil location under different layer separations.(f) The noticeable luminance error change and (g) the RMSE change with respect to the viewing center.

Figure 14 .
Figure 14.(a)-(c): The reconstructed HDR image with the down-sampling ratio of 27, 55 and 79 pixels, respectively; (d)-(f): the corresponding luminance error map at different down-sampling ratios.The simulation and experimental results of (g) JND percentage and (j) RMSE plots at different down-sampling ratios.

Figure 15 .
Figure 15.(a) The MTF plot of the dual-layer reconstructed HDR image with different modulation resolutions (SLM1+ SLM2); (b) the MTF plot of the down-sampled modulation layer (SLM2) and (c) the MTF plot of the display layer (SLM1).
Figure 15a -c plots the MTF values measured from the retinal images of the reconstructed HDR image, the modulation layer image (SLM2), and display layer image (SLM1), respectively, for 3 different multi-pixel modulation ratio of 1:1, 27:1, 55:1 and 79:1.The MTF of the high spatial frequency contents on the reconstructed image (Figure 15(a)) drops significantly due to the image down sampling on SLM2 layer, while the modulation on SLM1 improves the reconstructed HDR image performance at high spatial frequencies.Figure 16 shows the pupil stability result with different multi-pixel

Figure 16 .
Figure 16.The pupil sensitivity analysis of the reconstructed HDR image under different down-sampling ratios and viewing positions in the eyebox.(a) the luminance error plot at the pupil edge (−3 mm) without layer separation; (b) the noticeable luminance error map without layer separation; (c) the average noticeable luminance error change and RMSE change of all sampled pupil locations under different layer separations; (d) the noticeable luminance error change relative to central view reconstructed image and (e) the RMSE change relative to central view reconstructed image.