A perceptual eyebox for near-eye displays

: In near-eye display systems that support three-dimensional (3D) augmented and virtual reality, a central factor in determining the user experience is the size of the eyebox . The eyebox refers to a volume where the eye receives an acceptable view of the image with respect to a set of criteria and thresholds. The size and location of this volume are primarily driven by optical architecture choices in which designers trade-off a number of constraints, such as field of view, image quality, and product design. It is thus important to clearly quantify how design decisions affect the properties of the eyebox. Recent work has started evaluating the eyebox in 3D based purely on optical criteria. However, such analyses do not incorporate perceptual criteria that determine visual quality, which are particularly important for binocular 3D systems. To address this limitation, we introduce the framework of a perceptual eyebox . The perceptual eyebox is the volume where the eye(s) must be located for the user to experience a visual percept falling within a perceptually-defined criterion. We combine optical and perceptual data to characterize an example perceptual eyebox for display visibility in augmented reality. The key contributions in this paper include: comparing the perceptual eyebox for monocular and binocular display designs, modeling the effects of user eye separation, and examining the effects of eye rotation on the eyebox volume.


Introduction
Recent advances in wearable computing platforms are expanding the possibilities for immersive 3D user experiences, such as augmented and virtual reality (AR/VR). However, the platforms for AR/VR still pose a number of engineering challenges, particularly when it comes to consistently delivering clear, stable, and comfortable images to the user's eyes [1,2].
AR/VR systems rely on a near-eye display system in which a small image is magnified to create a large virtual image in the user's field of view. For VR, this system is opaque, blocking the user's view of the real world and presenting them with only a virtual world. For AR, this system is typically see-through and adds virtual content on top of the user's view of the real world. A core limitation in the design of near-eye display systems is that changes in the position of the user's eye(s) relative to the system-specifically changes in pupil location-can compromise the visibility and optical quality of the displayed content (Fig. 1). Optical criteria for determining which viewing positions fall within the acceptable eyebox may include, for example, luminance uniformity (i.e., vignetting), modulation transfer function (MTF), field curvature, chromatic aberration, or optical distortion [4][5][6]. The eyebox has to be designed and sized appropriately because several system-level parameters trade-off with the eyebox.
Optical designers and manufacturers rarely report the criteria used for determining the eyebox volume, and the dimensions are often reported as a 2D area at a specific viewing distance (eye relief) rather than a full volume. Most importantly, optical eyebox criteria are based on measurements that may not correlate well with the user's visual experience. For example, In the left image, the icons have relatively uniform visibility. In the right image, the icons on the right side are vignetted and distorted. The bottom two images illustrate a simulation of vignetting, using our AR testbed. We used these simulations in a user study to characterize perceptual tolerances for vignetting. Icons are sourced from [3].
anecdotally it has been observed that binocular systems (in which both eyes can see virtual content, supporting stereoscopic depth cues) seem to have a larger effective eyebox than monocular systems (in which only one eye sees virtual content). However, this perceptual experience is not captured by optical metrics.
In this report, we introduce the framework of the perceptual eyebox for characterizing the effective viewing volume for a near-eye display system. The perceptual eyebox employs a combination of optical quality data captured using existing techniques and new empirically measured perceptual criteria. This combination allows us to select perceptually-relevant bounds on near-eye display image quality data.
We focus on applying this framework to perception of vignetting artifacts in see-through AR. Our investigation is comprised of three main parts. First, we characterize in detail the optical vignetting of an example optic and image source using ray traced simulations. Next, we conduct a controlled user study for defining a set of perceptual criteria and thresholds for acceptable quality, using a parametric model of optical vignetting. Last, using the example optical system and the perceptual data together, we examine how factors such as binocular presentation, the distance between two eyes, and eye rotation affect the size and shape of a perceptual eyebox for vignetting. This work demonstrates how perceptual thresholds can contribute to solving key emerging challenges in near-eye optical design, particularly in determining how many product variations must be designed to cover a target population, a key question in the AR/VR optics industry.

Near-eye display systems for AR/VR
Near-eye display systems are typically composed of a compact image source (a display) and a set of optics that deliver the image to the user's eyes (Fig. 2). A typical VR system is composed of two lenses (one for each eye) and one or two displays. The lenses place a virtual image of the displays at some comfortable viewing distance (because human eyes cannot naturally focus on surfaces that are only a few centimeters away). AR systems are similar to VR systems, but transmit light from the real world. For this purpose, optical combiners, such as partial mirrors, are used to effectively 'add' the light from the image source to the light from the world before it enters the user's eyes [2,7].

Fig. 2.
Top-down illustrations of the basic components of VR and AR systems. A) A typical VR system is composed of a lens for each eye (optics) and two displays (image sources). Objects drawn on the displays are perceived to appear in front of the user in a virtual environment. AR systems require optical combiners that allow the user to see virtual and real content simultaneously. AR systems can have virtual content visible to both eyes (binocular, B), or just one eye (monocular, C). VR systems often rely on a head-mounted goggle form factor to accommodate larger displays and bulkier optics, leading to a relatively large eyebox in terms of display visibility, and thus perceptual issues secondary to visibility become more relevant (e.g., system resolution, pupil swim, and vergence/accommodation conflicts). AR systems often aim for a compact and energy efficient form factor, so particular attention is devoted to miniaturizing the combiners and displays. Design of such AR systems requires making a delicate trade-off between the size of the eyebox (to accommodate many people and different fits) and system efficiency, as well as other factors. For recent examples, see [8][9][10][11][12][13][14][15]. Characterizing the size and location of the usable eyebox volume is thus particularly important in vetting potential new designs for AR [4][5][6]16].

Optical quality in AR/VR
Due to the unique optical requirements of near-eye display systems, important trade-offs often occur in the optical domain. For example, the combiner for many AR optical architectures is an optical coating that controls what fraction of light is transmitted from the world, and what fraction is reflected from the display [17]. Achieving high transmission for world light often results in poor efficiency for adding the light from the display. At the same time, achieving a large eyebox requires a large volume around the designated viewing position to be filled with the light rays from the displayed image. Rays that do not pass through the user's pupil, however, result in a loss in system efficiency. It then becomes a design decision of how small an eyebox is feasible for the product while also achieving image quality objectives.
Of particular importance is the fact that when the eyebox is small, relatively small differences in the position of a user's pupil can result in vignetting at the edges of the displayed image, making it difficult or impossible for the virtual content to be seen. Vignetting occurs when the pupil gathers less light from content at the edges of the display [18]. Figure 3(A)-(C) illustrate this effect with a diagram of a near-eye display. Note that pixel emission cone widths vary by display technology and affect the size and location of the eyebox (e.g., OLED emission cones are wide, LCoS emission cones are narrow). In AR systems, this vignetting results in a loss of contrast of the virtual content with respect to the ambient environment. Based on Society of Automotive Engineer (SAE) standards and empirical research on heads-up display imagery, this contrast should be no less than 1.4:1 and absolutely no less than 1.2:1 [19,20]. However, if a display is nominally capable of adequate contrast but has vignetting at the edges, certain information could fall below these thresholds and yield an unacceptable experience. Most VR and AR systems also support stereoscopic viewing. In this case, there are now two eyeboxes that need to be considered (one for each eye's display), but this is not accounted for by current standards. Fig. 3. Top-down illustrations of how a user's eye location affects display visibility. A) The shaded rectangle indicates the potential locations for the user's eye within the head-mounted casing. The image source has three pixels, and the emission cones are traced through the optic (in this case, a simple magnifying lens). The optic collimates the emission cones, and the light from all 3 pixels is only visible in a smaller area of the viewing volume: the eyebox. B) When the pupil is located appropriately to gather light equally from all beams (asterisk), the retinal image (illustrated on the left) is relatively uniform. C) When the pupil is offset from the nominal eye position, it may gather less light from some regions of the display. This results in vignetting for one or both edges of the image. D) For a binocular system, offsets can result when the inter-display distance is mismatched to the inter-pupillary distance (IPD) of the user.

Binocular perception in AR/VR
For a binocular near-eye display system, a crucial factor in eyebox placement is the expected user eye separation, specifically, the inter-pupillary distance (IPD) of the users. Typical IPDs range from 50 to 75 mm [21]. If the inter-display distance is not well-matched to a user's IPD, this can create different patterns of optical degradation and vignetting in the two eyes ( Fig. 3(D)). For example, mismatches between the IPD and inter-display distance may result in vignetting of the nasal visual field of both eyes, or in the temporal visual field of both eyes. Binocular combination, however, could potentially mitigate the information loss and produce an acceptable experience [22]. For example, if virtual content has high contrast in one eye, that might be sufficient even if the contrast is reduced substantially by vignetting in the other eye. Percepts of stereoscopic depth are also somewhat robust to inter-ocular contrast differences [23]. However, classic psychophysical experiments on binocular contrast combination have employed sine wave gratings and other basic stimuli that are difficult to apply to more complicated situations with natural backgrounds and AR overlays [22,24,25]. At the same time, binocular combination can also be associated with a range of secondary artifacts, such as rivalrous percepts, which might produce undesirable perceptual effects in binocular systems that are absent in monocular systems [26]. Thus, in addition to the current uncertainty about acceptable levels of optical artifacts in AR/VR, the effect of binocular viewing on the perception of such artifacts is unknown. Current approaches to optical eyebox assessment do not provide tools for making good perceptual design trade-offs, because they do not explicitly take the visual percept of the user into account.

Defining a perceptual eyebox
To quantify the quality of the visual experience with near-eye display systems, we introduce the concept of a perceptual eyebox. The perceptual eyebox is a volume in space where the eye(s) must be located in order for the user to experience a visual percept determined by a perceptually-defined criterion, and can therefore be applied across a range of different system designs. In this section, we describe how optical and perceptual measurements can be combined to define a perceptual eyebox.

Characterization of system optics
To define the optical properties that contribute to an eyebox, we first conceptualize a 3D volume in space where optical quality is evaluated (Table 1). We acknowledge that there are several criteria (e.g., MTF, field curvature) that contribute to the user experience and we focus on evaluating the five-point field non-uniformity across this volume as a measure of the strength of display vignetting in this first investigation. We focus on display vignetting for two reasons: 1) the visibility of the content on the display is of primary importance in AR/VR use cases, and 2) if light does not reach the eye from a particular location, its quality is not important. Given the variety of optical architectures, it is not possible to expect universal agreement on a criteria, however, non-uniformity is one example that is shared across all optical architectures. Five-point field non-uniformity is measured by computing the Michelson contrast between a set of equal radiance pixels positioned at the center and edges of the display, arranged in a cross pattern, and viewed by a human eye model. Specifically, where f refers to the field non-uniformity, and I max and I min denote the maximum and minimum retinal illuminance of the five field points. If all five points have equal retinal illuminance (lm/mm 2 ), f = 0, meaning the image is uniform, and if f = 1, it means that at least one of the field points was completely clipped/vignetted. In the main analysis, we define this non-uniformity for a fixed viewing direction. However, later we consider how fixating each field point, by rotating the eye, can vignette the foveated point (e.g., fixational non-uniformity).
Evaluating the non-uniformity criterion for a given near-eye system can be done in simulation with ray tracing software using an optical model of the display/optics, or empirically with a calibrated camera and translation stage. For the purposes of demonstration, here we performed simulations in commercially available ray tracing software, LightTools, using an off-the-shelf optical eyepiece with an open prescription (Edmund Optics part 30-941), a scatter plate that simulates the emission cones of the pixels on a display, and a paraxial human eye model. The complete list of simulation parameters are summarized in Table 1. Figure 4(A) shows examples of the vignetting profiles derived from this simulation, illustrated for a solid white display pattern as viewed from three different pupil positions for the model eye in the horizontal plane. In Fig. 4(B), we show non-uniformity values as a heat map for a set of planes at some example eye reliefs (z). Axes labeled x and y represent the horizontal and vertical locations of the eye. As is typical, we observe a systematic increase in vignetting (non-uniformity) as the eye moves away from the nominal view position for this optic (0, 0, 0).

Setting a perceptual threshold on optical data
The optical quality data fall along a continuum, so a threshold needs to be selected to define a concrete eyebox volume. To illustrate this, we show two isocontours associated with two arbitrary thresholds overlaid on the non-uniformity heat maps ( Fig. 4(B)). These thresholds result in quite different contour shapes, and thus the resulting eyebox volumes differ in both shape and size.
How can we meaningfully select between the different thresholds illustrated in Fig. 4(B) to quantify and compare eyebox volumes? Are the appropriate thresholds and criteria the same for AR and VR systems, with different levels of system contrast, field of view, and fit? Given the complexity of these factors, we propose that the appropriate thresholds be defined empirically via perceptual experiments and then applied to the optical analysis.
In some cases, an appropriate perceptual threshold may be determined a priori. For example, this might be done by selecting a threshold for minimum resolution based on an observer with typical visual acuity, as is done by display manufacturers [27]. An alternative approach is to assess the acceptability of specific artifacts empirically through user testing. For example, after capturing the images associated with different eye locations for a given near-eye system, the sensitivity of users to the associated artifacts can be measured with a controlled user study. User studies could be based on detectability of artifacts, user preferences, or the effect of artifacts on performance of a particular task. Thus, we propose that a perceptual eyebox has three components: 1) optical measurements characterizing quality as a function of pupil location, 2) empirical data capturing perceptual performance as a function of optical quality, and 3) a selected threshold for acceptable levels of perceptual performance, which can be applied to #1.
In the next sections, we describe the methods and results of a perceptual study designed to estimate the appropriate perceptual thresholds for detectability of vignetting effects in AR, which we then use to explore how design factors can modulate the size and shape of the perceptual eyebox.

AR testbed
We designed a display system and software environment to create what we call an AR testbed. The goal of the testbed was to build a high quality stereoscopic display system that allowed for the simulation of optical degradations to be performed in software. This simulation then allowed us to conduct perceptual experiments in which the optical quality of the visual stimuli was under experimenter control.

Display system
The AR testbed hardware was designed to provide a physical eyebox that was very large with consistently high optical quality throughout the visual field. For this purpose, we built a wide-field of view, high-resolution, desk-mounted mirror haploscope (Fig. 5). Using flat mirrors instead of lenses or curved mirrors allows the testbed hardware to be effectively free of any detectable optical aberrations. However, it can be challenging to achieve a wide and symmetric field of view with standard flat front surface mirrors, because they often cannot get close enough to the eyes to fill the visual field. We custom-machined a pair of mirrors (12.5 cm x 6 cm) from optical grade aluminum with rounded edges, enabling a comfortable fit and wide binocular field-of-view (at least 60 • horizontal and 34 • vertical for all tested users). Desk-mounted LCD screens (LG 32UD99-W, 32" 4k) were used to present high resolution imagery within the viewing frusta of the mirrors. The monitors were set at a viewing distance of 61cm, resulting in an angular resolution of 64 pixels per degree when viewed through the mirrors. The maximum display luminance was approximately 148 cd/m 2 . When designing the experimental stimuli, we assumed uniform and unaberrated images were delivered from the displays to the eye. Experiments using the testbed were conducted in a dark room to minimize uncontrolled conflicting peripheral stimuli from the testing room.

Natural backgrounds
Stereoscopic images of natural and man-made environments were used to simulate real-world backgrounds. Images were derived from the SYNS natural stereo image dataset [28], which contains 8-bit RGB stereo image pairs captured with a 6.3 cm camera baseline. Because individual images in the dataset subtended only 38 • horizontally, we stitched overlapping adjacent views to generate wider composites. The flanking images were warped and blended into the base image using OpenCV [29]. Twenty-five stereoscopic indoor and outdoor views were generated.

Virtual content
We generated icon menus that simulated typical virtual content for AR. Mobile application icons [3] were placed in an 11 × 7 rectangular array subtending 40.0 • horizontally and 25.5 • vertically. Each menu had a 25.5 • × 10.9 • clear aperture in the center and 75% of the remaining space was filled with icons. One hundred unique random menus were generated.

AR simulation
The icon menus were additively combined with a random background. The background intensity was first normalized to range from 0-1 and then reduced to 66.6% of its maximum value. The icon menu's intensity was normalized to range from 0-33.3%, so that when the background and icon menus were combined, the experimental stimulus would span the luminance range of the display. With this allocation of display luminance, the AR image contrast is nominally 1.5:1, and thus basic iconography should be visible across a range of background content. Before combination, the icon menu in each eye's view was horizontally displaced so that it would appear to be at a distance of 2m when stereoscopically fused, which was always in front of the background content.

Vignetting
Vignetting profiles change with different optical architectures. Rather than simulate a specific architecture, we designed generic vignetting profiles with a smooth luminance falloff, similar to that observed in commercial AR systems (Fig. 1). Starting at an eccentricity of 12.5 • from the center of the icon menu on either the left or right side, the luminance of the icons was smoothly scaled down to 75%, 50%, 25%, or 10% at the edge of the menu, following a Gaussian function (Fig. 6). These scale factors correspond to optical non-uniformities of 0.15, 0.33, 0.60, and 0.82, respectively (Eq. (1)).
For this study, we focused in on the common issue of a mismatch between the fit of the display hardware and the user's eye(s) due to individual differences in IPD. That is, we simulated situations in which the pupil(s) were offset horizontally from the center of the eyebox. For a monocular system, this would result in vignetting on either the nasal or temporal side of the visual field of the eye viewing the virtual content. Nasal/temporal fields refer to the sides of the visual field towards the nose and the temple of the user, respectively. For a binocular system, this would result in mirrored vignetting profiles. That is, the nasal field would be vignetted in both eyes, or the temporal field would be vignetted in both eyes (see Fig. 3 and Fig. 7). Importantly, in the case of a binocular system, this means that content vignetted in one eye will be visible to the other. If, however, the hardware shifts on the face (e.g., slides down the user's nose bridge), binocular vignetting may occur in the same region of the visual field of both eyes.

Participants
Thirty adults (20F, 10M) with an average age of 25 years (SD = 5 years) participated. All had normal or corrected-to-normal visual and stereo-acuity. They all gave informed consent prior to starting the experiment and were compensated for their time. The procedures were approved by the institutional review board at University of California, Berkeley and were in compliance with the Declaration of Helsinki.

Experiment
To assess the detectability of vignetting, we used a two-interval procedure, in which participants were asked to judge which of two stimuli appeared more uniform. One stimulus (the standard) was always 100% uniform, and the other (the probe) was vignetted. We also included trials in which the probe was un-vignetted to assess the baseline response rates for identical stimuli. The order of presentation of the standard and probe was randomized.
On each trial (Fig. 8), the participant first viewed a randomly chosen background image. After 3 sec, the first stimulus was shown for 5 sec. After a 3 sec inter-stimulus interval, in which only the background was shown, the second stimulus was presented for 5 sec. Participants used a 5-point Likert scale to choose if the first or second stimulus had better uniformity, and if this difference was slight or strong (they could also respond that the two stimuli were the same). We chose Likert ratings because they are well-suited for gauging subjective user experience and therefore directly useful for general design decisions. These ratings allow a designer to establish a criterion based on user experience (e.g., "No more than 5% of users should say that the vignetting is worse than a comparison display without vignetting"). After every 15 trials, the participant viewed an alignment screen to confirm they could still see the full field of view. The whole experiment took approximately 2 hours (including an initial practice session of 20 trials). In addition to varying the vignetting of the probe, we included trials that simulated either monocular or binocular AR systems (Fig. 7). In monocular trials, the standard and probe icon menus were only visible to one eye, with the same backgrounds as the binocular trials. On half of the monocular trials, the standard/probe were shown to the left eye, and on half they were shown to the right eye. In binocular trials, the standard and probe were visible to both eyes.
For monocular and binocular conditions, the side of the visual field that was vignetted in the probe stimulus was also randomized, with half of trials vignetting the nasal field(s), and the other half vignetting the temporal field(s). All stimulus configurations (vignetting strength (5 levels), monocular/binocular, nasal/temporal field) were randomly interleaved. There was a total of 192 trials with 6 repetitions per configuration. We also included a set of binocular trials in which only one eye was vignetted -these results were similar to the main binocular condition, so they are omitted here for brevity.

Data analysis
Responses were recoded to account for the randomization of the stimulus interval so that the analysis scale measured the judgement that the non-uniform probe stimulus had worse uniformity than the standard stimulus. The proportion of responses that the uniform standard was "any worse" (which includes both strong and slight criteria) and "strongly worse" (which includes only strong criteria), were calculated for each participant.

Detection of vignetting
First we summarize the proportion of trials in which participants detected vignetting (i.e., responded that the vignetted stimulus was worse), separately for the monocular and binocular trials. Figure 9 shows the results of this analysis, averaged across the participants. Fig. 9. Average proportion of responses that the vignetted probe interval was any worse (left plot) and strongly worse in uniformity (right plot) for monocular (red circles) and binocular (blue squares) AR displays as a function of probe non-uniformity. Hollow markers are the un-vignetted uniform probes and dotted lines indicate floor and ceiling performance. Solid curves are cumulative Gaussian fits (Eq. (2)) and error bars are 95% confidence intervals of the means. The results of pairwise t-tests (degrees of freedom = 29) comparing monocular to binocular responses at each non-uniformity level (from lowest to highest) were: t = 1.30, p = 0.20; t = 2.75, p = 0.01; t = 3.94, p < 0.001; t = 4.22, p < 0.001; t = 5.03, p < 0.001 for the any worse criterion, and t = −0.57, p = 0.57; t = 0.41, p = 0.69; t = 3.40, p = 0.002; t = 5.56, p < 0.001; t = 7.87, p < 0.001 for the strongly worse criterion.
The "any worse" responses (left panel) are unsurprisingly more frequent on average than the "strongly worse" responses (right panel), since this is a weaker criterion. For example, on the baseline trials in which both the standard and the probe were 100% uniform (open markers), participants selected one or the other as being worse on almost 40% of trials, likely because even when both stimuli were uniform, participants were unsure if they were exactly identical. However, on these trials they rarely ever selected one stimulus as being strongly worse.
In both the monocular and binocular conditions, the proportion of trials that the probe was judged worse increased monotonically from this baseline value, but the steepness of the increase varied substantially. The binocular AR condition had substantially lower perceived vignetting than the monocular condition, especially for higher values of non-uniformity. The effect is most pronounced for the strongly worse responses and is likely due to the benefit of perceptual summation with binocular viewing of the virtual content. Because of the mirrored vignetting in the binocular condition, vignetted content in one eye was still visible to the other. To test whether the lower binocular sensitivity was statistically significant, we performed paired t-tests between the response proportions in the monocular and binocular conditions at each level of probe non-uniformity. We used a significance threshold of 0.01 following Bonferroni correction for multiple comparisons. For both the "any worse" and "strongly worse" criteria, users were significantly less sensitive to probe non-uniformities in the binocular condition for probe levels greater than 0.15 (see figure caption for details).
Overall, these results provide a quantitative mapping from an optical measure (luminance non-uniformity) to a perceptual measure (proportion of the time an average user is predicted to report non-uniformity). The results suggest that this mapping differs substantially for viewing offsets (e.g., IPD mismatches) in monocular and binocular AR systems. In the next section, we formalize how to use these data to specify a perceptual eyebox.

Characterizing the perceptual eyebox
The average response proportions in each condition in Fig. 9 were fit with a Gaussian cumulative distribution function: Here, P denotes the proportion of time that the non-uniform probe was judged worse, f is the probe non-uniformity, and µ and σ are free parameters that determine the mean and standard deviation of the fitted function. The fits to the experimental data are illustrated in Fig. 9 and the parameters are listed in in Table 2. Solving Eq.
(2) for f , we can then define the maximum non-uniformity that is allowable for a given threshold for judgement P: This equation can be solved using the appropriate µ and σ for a given design (i.e., monocular/binocular). The value of f determines the bounding isosurface used to define the perceptual eyebox for a given optical design. The separate fits for the "strongly worse" criteria provide an option for applying a stricter level of acceptable salience. Using the optical data described in Sec. 3.1, Fig. 10 visualizes the perceptual eyeboxes for two selected thresholds using the "Any worse" responses. This visualization allows for comparison of the shape and size of the eyebox volumes for monocular and binocular systems using the example eyepiece and display. It is clear in each example that the binocular system eyebox has a larger volume than the monocular one, but the shape is similar overall. Note that the edge of the evaluation volume that is closest to the optic is at a distance of 6.9mm -closer viewing distances would likely lead to discomfort from eyelashes touching the optic. To quantify this difference further, we calculated the ratio of the binocular and monocular eyebox volumes for all possible non-uniformity thresholds. For this optical design, we found that the binocular perceptual eyebox volume was about a factor of 2 larger than the monocular eyebox for a wide range of thresholds. Importantly, this result assumes that the primary cause of vignetting is an IPD mismatch that causes mirrored binocular vignetting, which is not always the case. However, this ratio provides a useful guide for the expected gains in the effective eyebox volume associated with making an AR system binocular. Fig. 10. Visuals of perceptual eyebox volumes for different perceptual thresholds and system designs. First, a perceptual threshold is chosen (P). This threshold produces the upper bound for the permissible optical non-uniformity. With the example optical quality data (see Sec. 3.1), this threshold defines an isocontour at any given eye relief. Swept along all possible eye reliefs, it produces a volume, here shown for the two chosen thresholds (0.45 and 0.75, left/right) and monocular and binocular results (top/bottom) using the "Any worse" criterion. Negative z values indicate positions closer to the optic.

Individual differences in sensitivity
The extent to which the perceptual eyebox varies within a typical population of users is also a useful piece of information. Figure 11 shows the individual participant proportions from the current study. While the general trend is similar across individuals, we observed that some participants tended to have consistently higher or lower sensitivities than others. One conservative approach to preventing visible artifacts is to consider only the most sensitive potential users when selecting perceptual thresholds. For example, Table 3 indicates the parameters that would be used to determine the perceptual eyebox if only the five most sensitive participants are considered. Using Eq. (3), we can compute the maximum permissible non-uniformities for a given criterion for the average participant (from Table 2) and for the 5 most sensitive in our experiment (from Table 3), as shown in Table 4.   Table 4. Maximum non-uniformities (f ) from Eq. (3) for responding that non-uniform was worse for all participants (see Table 2) and for the top 5 participants (see Table 3). It would also be useful to be able to predict whether a given individual is likely to be sensitive to vignetting effects or not. Prior work has established that there are individual differences in binocular combination, related to which eye is most dominant. Eye dominance appears to correlate with some perceptual differences between individuals, but not others [30,31]. We wondered whether the identity or strength of the dominant eye might account for some of the individual differences in our data. For example, participants with stronger eye dominance might be more likely to notice vignetting in the binocular conditions, if the dominant eye's percept overruled the binocularly fused image. Thus, during the perceptual study we also assessed the eye dominance of each participant, and conducted exploratory analyses to ask if eye dominance was predictive of individual differences. In the current data, we did not find evidence that properties of eye dominance could account for a substantial portion of individual differences. Future work can examine other perceptual or cognitive factors that might explain the variability in sensitivity, and perhaps whether this variability translates to individual differences during actual use cases.

User interpupillary distance
We also hypothesized that there could be differences in the sensitivity to vignetting depending on position within the visual field. For example, many people have their nasal visual fields slightly occluded by their nose bridge, and thus might be less sensitive to nasal vignetting. If so, this would have implications for the range of acceptable inter-display distances for binocular systems, since vignetting patterns differ in their visual field locations depending on whether the display separation is larger or smaller than the user's IPD.
We conducted a post hoc analysis to consider if there was a difference between performance with nasal and temporal vignetting (Fig. 12). For the monocular condition, we did not observe any substantial differences (although users were slightly and consistently more sensitive to nasal vignetting). However, for the binocular condition, sensitivity to non-uniformity consistently tended to be higher when vignetting occurred in the temporal visual field. Consistent with this observation, paired t-tests revealed no statistically significant difference between nasal and temporal vignetting in the monocular condition, however, in the binocular condition, sensitivity was significantly higher for temporal vignetting at the 0.33 probe non-uniformity with the "any worse" criterion and at the 0.60 and 0.82 probe non-uniformity with the "strongly worse" criterion (see figure caption for details).
These results suggest that the perceptual eyebox could take on an asymmetric shape, even if optical quality degrades symmetrically. Importantly, different sensitivities to nasal and temporal vignetting could have implications for the selection of sensible inter-display distances for near-eye display systems. Depending on the design specifics and eye relief, nasal and temporal vignetting might occur when the user's IPD is smaller or larger than the inter-display distance. The current results suggest that situations that create temporal vignetting in binocular systems should be avoided.
To examine this question further, we applied the asymmetric nasal/temporal sensitivities from our user study to the example optic. First, we calculated the non-uniformity in the nasal and temporal fields associated with shifting the user's IPD to be less than or greater than the inter-display distance. The results are shown in Fig. 13 (left panels) as heat maps, plotted for a horizontal slice through the viewing volume centered at y = 0 (the nominal eye elevation). We applied the nasal and temporal sensitivity curves from Fig. 12 to the appropriate fields, which are plotted as contours on each panel for a perceptual threshold of P = 0.75. In the right panel, we visualize the area where both the nasal and temporal non-uniformities fall below this threshold. Note that the abscissa on these plots denotes the magnitude of symmetric IPD mismatch, not the horizontal position of either eye. For this optic, the result shows that there is a larger area of permissible non-uniformities when the IPD is larger than the inter-display distance and eye relief is closer than nominal. This asymmetry can be taken into account when selecting a single inter-display distance or a range of inter-display distances for system design based on the expected IPDs of the users.

Effect of eye rotation
In the previous analyses, we calculated vignetting as the non-uniformity of all targets across the visual field, with a visual axis orthogonal to the display. Alternatively, it is possible to consider how non-uniformity varies as a user sequentially fixates each location. We can refer to these as field non-uniformity and fixational non-uniformity, respectively. Because the center of rotation of the eye is located behind the pupil, the pupil will move within the viewing volume as the eye rotates to fixate different regions of the display. We can reason that the perceptual eyebox for fixational non-uniformity will thus be smaller.
The data from our current perceptual study only addressed field non-uniformity, however we can use these data to examine the potential effect of fixational non-uniformity on the perceptual experience. To examine this question quantitatively, we recalculated optical non-uniformity (Eq. (1)) assuming the eye is rotated to fixate each field point (i.e., the eye rotates 20 degrees left to fixate the leftmost point, 20 degrees up to fixate the top point). We moved the eye to the same sample locations within the evaluation volume ( Table 1) and assumed that the center of rotation was 10.5 mm behind the pupil. The results are plotted in Fig. 14 for a horizontal slice through the evaluation volume, with field non-uniformity on the left and fixational non-uniformity on the right. As expected, the fixational non-uniformity is higher within the evaluation volume, and the perceptual eyebox for fixational non-uniformity is thus substantially smaller. This makes sense: for locations near the edge of the acceptable viewing volume, an eye rotation is likely to move the pupil into a location with stronger vignetting. Interestingly, as the eye approaches the Fig. 13. Asymmetric effects of IPD mismatches. Because users were more sensitive to vignetting in the temporal field, we expect that mismatches between the display separation and the IPD will be more or less detectable depending on how they affect the visual field. For the example optic, the left panels show the optical non-uniformity of the nasal and temporal fields as a function of the IPD mismatch for a horizontal slice through the viewing volume at y = 0. Negative values on the abscissa indicate that the user's IPD is smaller than the inter-display separation, and vice versa. White lines illustrate the binocular perceptual threshold of 0.75 for the "any worse" judgement. Note that although non-uniformity can be low for a given field far from the nominal viewing position, the total number of rays also decreases (not show here). In the right panel, we plot the intersection of these two regions, mapping out a horizontal slice through the perceptual eyebox. For the example optic, the predicted eyebox has more area for IPD mismatches greater than 0, meaning it it is more tolerant to larger IPDs than smaller IPDs for the given inter-display distance. optic, fixational non-uniformities decrease due to additional light entering the pupil when field points are fixated. Both field non-uniformity and fixational non-uniformity likely play important roles in perception of image quality in AR, and understanding their relative impact would be an important direction for future work.

Discussion
Near-eye display systems, and particularly AR hardware with a glasses-like form factor, still face considerable design and engineering challenges. In order to rest comfortably on the face, these systems must be lightweight and have good weight distribution over the nose and ears. The display engine is one of the greatest sources of power consumption, so given the importance of limiting battery weight/size, it is essential that as much light as possible from the display reaches the eye(s). This issue leads to pressure for minimizing the eyebox to a volume just large enough to envelop the pupil for a typical sample of users.
We introduced the framework of the perceptual eyebox to provide a principled method for vetting and optimizing eyebox volumes. It is important to consider how our results generalize to real AR systems. When designing the AR testbed we aimed to make reasonable decisions to balance experimental control with external validity. However, any system designed for this type of evaluation will have limitations. For example, the testbed's dynamic range is more limited than a real AR experience, particularly outdoors. Based on psychophysical literature on pattern detection/discrimination, relative contrast is likely the most pertinent variable for vignetting perception. Thus, we focused on meeting guidelines for contrast in head-up displays [19,20]. In addition, the testbed did not present correct focus cues [32][33][34]. However, all content was 2m or further away (including the AR icons) so detectable defocus in real environments of similar layout is negligible. This choice was made intentionally, but it leaves the issue of focus cues unaddressed. Lastly, the pattern of luminance fall-off likely contributes to vignetting perception. We used a generic pattern that shares similarities with a range of systems. We also found that this pattern was similar to a specific system (an Epson Movario BT-300, see Fig. 1). Architectureand use-case specific user studies could be run to examine whether and how these factors impact the perceptual eyebox.
We chose to examine the issue of monocular versus binocular systems as an important example case in which perceptual issues that effect eyebox size extend beyond optics. Monocular systems have somewhat relaxed design requirements, with the main one being that the eyebox is positioned to coincide with the pupil. Design-wise, binocular systems present more than twice the challenge. In addition to achieving good eyebox positioning for both eyes, the system hardware also must hold the displays and optics in a precise alignment with respect to each other to avoid creating erroneous binocular disparities, which have been implicated in creating discomfort as well as depth distortions [35,36]. Our results support the idea that eyebox requirements for display visibility can be relaxed for binocular systems. However it is worth noting that complete vignetting that renders content invisible to either eye will remove stereoscopic depth even if visibility is preserved.
Once a choice for binocular presentation is made, a major challenge in designing the rigidity of binocular frames is that it can be difficult to accommodate the wide range of IPDs across different users. The choice of inter-display distances needs to account for users' IPD variability [21]. Our data suggest that the impact of mismatches between the user's IPD and the inter-display distance depends on whether the mismatch produces temporal or nasal field vignetting. These consequences can be taken into account for any display design to more accurately predict the range of user IPDs and eye reliefs that will be acceptable.
In the current perceptual study, we used a static vignetting profile. This offers a clearly defined situation, but it is worth noting that eye movements can have a dynamic impact on the artifacts observed. The pupil moves during eye movements, which can cause it to shift into or out of the eyebox [15]. In the worst case scenario, a small eyebox can allow virtual content to be visible in the periphery but disappear when the user attempts to fixate on it, resulting high fixational non-uniformity. We examined this effect in an exploratory analysis in Sec. 6.5, and confirmed that fixational considerations can modify the relevant shape and size of the eyebox volume. Further perceptual experiments that employ eyetracking to simulate eye rotation effects can offer insight into how these dynamic optical changes are tolerated by users.
Our perceptual study focused on artifacts caused by luminance vignetting. We made this choice because light first needs to get to the eye before we can examine other aspects of its quality. By comparison, other artifacts, including rainbow effects and dispersion, tend to be more architecture-specific and lend themselves to studies geared towards particular architectures (e.g., diffractive waveguides). Vignetting is one criterion that will apply to all optical architectures in AR/VR. Future work should explore how the perceptual eyebox varies when additional (possibly multifactorial) degradations in specific architectures are considered, as well as alternative perceptual outcomes such as task-based performance. The combination of precise optical simulations and controlled perceptual studies can help to efficiently advance optical design for the next-generation of near-eye displays.