ChromaCorrect: prescription correction in virtual reality headsets through perceptual guidance

A large portion of today’s world population suffers from vision impairments and wears prescription eyeglasses. However, prescription glasses cause additional bulk and discomfort when used with virtual reality (VR) headsets, negatively impacting the viewer’s visual experience. In this work, we remedy the usage of prescription eyeglasses with screens by shifting the optical complexity into the software. Our proposal is a prescription-aware rendering approach for providing sharper and more immersive imagery for screens, including VR headsets. To this end, we develop a differentiable display and visual perception model encapsulating the human visual system’s display-specific parameters, color, visual acuity, and user-specific refractive errors. Using this differentiable visual perception model, we optimize the rendered imagery in the display using gradient-descent solvers. This way, we provide prescription glasses-free sharper images for a person with vision impairments. We evaluate our approach and show significant quality and contrast improvements for users with vision impairments.


INTRODUCTION
Virtual Reality (VR) headsets are becoming increasingly popular amongst consumers.This encouraged researchers to conceptualize and build technologies that would enable fully immersive remote experiences [34].However, most of the recent developments overlook the prevalence of refractive vision problems such as myopia, hyperopia, or astigmatism among potential VR users especially older than 40 years old, which is at least 23.9%, 8.4% and 33% of population, respectively [33].Moreover, while the current near-eye display research is focused on miniaturization of the headset to eyeglasses form-factor [21,29], wearing prescription glasses under a VR headset causes uncomfortable viewing experiences that break the feeling of immersion.
Hardware-driven approaches to prescription correction [9,23,48] may lead to VR headsets that are bulkier and expensive while necessitating upgrading components with new devices.On the other hand, algorithmic approaches to prescription correction enable tackling the prescription issue without the need for specialized components and with the benefit of software updates [31].
Our work offers a new perceptually-guided algorithmic approach to prescription correction, while eliminating the need of corrective lenses.To this end, we first study the low-level workings of the Human Visual System (HVS), i.e., how different types of cone cells respond to various wavelengths of light.We then model the display's specific light spectrum (e.g.subpixels emitting various wavelengths) and the associated response of cone cells on the retina.Hence, we build an end-to-end differentiable perception model that helps us to simulate how a user with a Point-Spread Function (PSF) model with Zernike polynomials [27] perceives images on a specific display.Finally, our end-to-end perception framework enables optimizing the display rendering to produce an in-focus image for a user with vision impairments.Specifically, our work makes the following contributions: • Perceptually guided Prescription Correction.We incorporate the display specific color perception and PSF of a user into a new differentiable model to ensure that the optimized image's contrast and color characteristics are distinctly enhanced in visual perception.
• Learned Prescription Correction.We train a Convolutional Neural Network (CNN) to estimate optimal images for prescription correction, enabling prescription correction at interactive rates.
• Evaluation on Actual Displays.We analyze our findings beyond simulations.Thus, we evaluate our approach to VR headsets and conventional displays and demonstrate real-life use cases.

RELATED WORK
Researchers have previously attempted to compensate for refractive vision problems for glasses-free experience in displays.We summarize most relevant papers here in Tbl. 1.
Programmable Prescription Lenses.Utilizing focus-tunable lenses that may be adjusted to the user's prescription is a common technique, especially in displays such as VR headsets where the users view a display through magnifying lenses [9,26,36,41].An alternative to these approaches, phase-only spatial light modulators, could also be used to form a programmable prescription correction lens [19].Beyond requiring customized hardware, these techniques would also require eye-tracking and depth sensor data of a scene to operate, leading to more demands in hardware.
Computational Displays.Altering the display hardware and image acquisition technologies could help with prescription correction [25].Huang et al. [16] address extreme contrast loss and ringing artifacts in algorithmic correction techniques by utilizing a stack of semi-transparent, light-emitting layers for LCDs.Wu and Kim [49] embed free-form image combiners inside prescription lenses to create customizable Augmented Reality (AR) displays.Pamplona et al. [37] implements 4D light field displays to move the solution to a higher-dimensional (light field) space, where the inverse problem is well-posed.To overcome this limitation in resolutions in Pamplona's work [37], Huang et al. [17] propose a 4D prefiltering algorithm that can provide higher contrasts and resolutions.The described approach [37] has a significant drawback, namely that the PSF of an eye with refractive errors is typically a low-pass filter and, as such, irrevocably cancels higher frequencies from the original image.Moreover, holographic vision correction [10,20] is superior to conventional approaches, including light field displays.Curious readers could consult to survey by Aydınoglu et al. [7] for more on these holographic displays.

Algorithmic Prescription Correction.
Refractive vision impairments of the eye are commonly modeled by constructing a PSF that represents how the eye as an optical system transmits a point on the object to a point on the retina.The spatially varying PSF is convolved with the image of the object to produce the image formed on the retina.Performing the inverse operation, i.e., deconvolving the image with the retinal PSF, could help produce an image that forms clearly on the retina when observed.Alonso et al. [3] verifies the possibility of such an image correction technique by constructing a simple artificial eye and comparing the image it forms when viewing a standard and a corrected image.They also propose an ad-hoc solution to mitigate contrast loss and "ripples" or ringing artifacts [4].Monalto et al. [31] present constrained total variation to decrease ringing artifacts in the corrected image while sharpening the image's edges, thereby producing an image with high contrast along sharp edges.Ye et al. [53] focus on finding a ringingfree image with higher contrast in locations important to Human Visual System (HVS), while tolerating more blurriness elsewhere.Tanaka et al. [43] uses a CNN-based pipeline for prescription correction along with Zernike-based visual aberration modeling.Li et al. [28] feed an aberrated image and a map of a PSF for multiple subregions, to account for spatially variant aberrations into a deep neural network and train it for image correction on a variety of lenses.Similar image correction techniques have been applied to VR headsets.Itoh et al. [18] corrects the defocus aberration for optical see-through headsets by overlaying a compensated image in the user's view.Xu et al. [51] use gradient-based priors to achieve realtime visual aberration correction for VR HMDs.Oshima et al. [35] describe realtime defocus correction for optical see-through HMDs, which is caused by focal rivalry: the simultaneous viewing of real and virtual content.Perceptual considerations in displays and graphics systems are becoming commonplace in relevant research branches (e.g.consult our supplementary for perceptual considerations in graphics systems).The surveyed research work does not provide a complete model of HVS in their solutions, leading to either poor image quality or demanding hardware.We believe our work resembles the first attempt to enhance algorithmic solutions in the literature by bridging the gap between perceptual modeling and prescription correction.

PERCEPTUALLY GUIDED PRESCRIPTION CORRECTION
We introduce a differentiable framework for modeling the display and human visual perception, encapsulating display-specific parameters, color and visual acuity of human visual system and the user-specific refractive errors.Our framework allows for optimizing prescription compensated rendered imagery on standard displays using a gradient-based policy with novel display-specific perceptually guided loss functions (Section 3.1).We rely on Zernike polynomials (Section 3.2) for describing user-specific retinal point spread functions [10] within the forward model to represent optical aberrations in the HVS (Section 3.3).On overview of our entire display-visual perception model and the optimization process is depicted in Fig. 2.

Modeling Display-specific Visual Perception
We characterize our target display and device a computational model to transform the displayed imagery on the target display into imagery as perceived by the HVS.
Characterizing target display.A given display has three types of emission spectra, λ R , λ G , λ B , for their red, green, and blue channel pixels, respectively.As these emission spectra vary for each display system, we calibrate the spectra using a spectrometer by measuring the spectral bands of the target display at various pixel levels.More details on the spectra measurement and display calibration process are discussed in the Supplementary Material.We then fit a proxy function to determine the display color primaries from the spectral measurements.While a simple Gaussian mixture model with weighted sum of Gaussians can be used for such a proxy color primary function, we learn this function using a multi-layer perceptron network that act as general function approximator.Implementation of this proxy function fitting can be found in (See odak.learn.tools.multilayer perceptron() in [2]).Once we fit a proxy function for the color primaries, we utilize it to investigate the color perception responses of the HVS.
Converting color primaries to perceived colors.Human retinal cells can be broadly classified into rods and cones.Cone cells, which are primarily responsible for color perception in the HVS, are of three different subtypes: Short (S), Medium (M), and Long (L) cells.Each of them differs in its sensitivity to different wavelengths of light.Please refer to our Supplementary Material for a detailed discussion this.The L, M, and S cones reduce wavelengths of incoming light into trichromat values by integrating them over their response functions [50].Note that perception in HVS is contrary to modeling general camera or display response where red, green and blue wavelengths are independently measured on the camera sensor or the human retina.The following steps show how to convert an input color image displayed on a target display to the corresponding cone response: where I R , I G , I B represents red, green and blue pixel values of an input image, and I L , I M , I S represents L, M and S cone activation values for each pixel of the displayed image.From the generalized formula above, we provide a sample conversion for L R as in the following equation, where λ L represents L cone sensitivity function, λ R represents red pixel emission spectrum function for a targeted display, and L R represents L cone output for the displayed red pixel.Similarly, L cone sensitivity functions for green and blue pixel emissions can be computed.Thus, L, M and S cone sensitivity functions can be computed for the three different subpixel emissions.After computing the cone sensitivity functions, we apply the conversion from the color opponency model proposed by Schmidt et al. [40] to represent a complete perception model, where I (M+S)−L , I (L+S)−M , I (L,M,S) represents the three channels of the image sensed in the color-opponency space.

Computing Point Spread Functions from Color Primaries
The point spread function for the HVS with visual aberrations can be defined over several wavelengths of light (see Supplementary Material for equations).Therefore, we can sample a set of wavelengths from each color primary, calculate PSFs for each and use a weighted sum of the PSFs to obtain a single, combined PSF for each color primary, where c represents a particular color primary, PSF(x, y, c) is the PSF for a particular color primary, PSF(x, y, λ c i ) the PSF for a sampled wavelength in the color primary and w λ c i is the weight for that sampled wavelength.The above PSF kernel can be utilized in RGB, or color opponency spaces, depending on designers choices.In our method, we introduce color opponency based PSF formulation (perceptually guided) to improve the perceptual characteristics (contrast, quality) of the retinal image.Eq. 4 is extended to formulate LMS based kernel, PSF lms (x, y, λ c i ) = A * PSF(x, y, λ c i ) (5) where A is the conversion matrix defined in Eq. 1, PSF lms (x, y, c) is the PSF for a particular color primary with LMS components.Similarly, we modelled the digital camera color primary decoding by using measurements from the display and captured images from the digital camera.In this way, we are able to use digital camera captured images to represent our work in this paper.In the Eq. 5 and Eq. 6, PSF lms is represented for both the HVS and digital camera RGB decoding.We can now compute the retinal image r(x, y, c) in the LMS space, by convolving PSF lms with the input image s(x, y, c),

Optimizing Images for Prescription Correction
In the final step, we aim to optimize an image which, after passing through the eye's optical system (modelled as a convolution in Eq. 7), is intended to produce a retinal image that is as close as possible to the ground truth image.This is done by solving the optimization problem, where t is the the ground truth image and s ′ is the input image optimized for a user's eye, PSF is kernel defined in Eq. 4. In our method, we reformulate Eq. 8 to incorporate color opponency space optimization,  4) Our optimization pipeline relies on the perceptually guided model described in previous steps (1-3).Thus, the optimization pipeline converts a given RGB image to LMS space at each optimization step while accounting for the PSFs of a viewer modelled using Zernike polynomials.(5) Our loss function penalizes the simulated image derived from the perceptually guided model against a target image in LMS space.Finally, our differentiable optimization pipeline identifies proper input RGB images using a Stochastic Gradient Descent solver [38].
where t lms is the the ground truth image in LMS space and s ′ is the input image optimized for a user's eye, PSF lms is kernel defined in Eq. 6.To perform the above optimization, we compare images using a loss function (e.g.least-squared error) to calculate the erorr between the ground truth image and the retinal image, L (r(x, y, c),t(x, y, c)), where x and y represent image coordinates and c the color channels, which could be in RGB or LMS color opponency spaces.Note that we have also built a learned equivalent of our approach, which we will detail in the Sec. 4.

IMPLEMENTATION
Our approach is comprised of three primary elements: a color perception model, a prescription correction optimization pipeline and a learned model that demonstrates that our differentiable pipeline can be learnt.All of these components are implemented on PyTorch [38].

Color Perception Model
Firstly, we identify the emitted wavelengths from the subpixels of a target display device.For that purpose, we acquire the spectrometer data for a target display consisting of discrete wavelengths and their corresponding intensity values normalized between zero and one.We use Multilayer Perceptron (MLP) to fit a curve on this discrete data to achieve a vector representation of our intensity profile of color primaries with respect to wavelength.Our MLP has 64 hidden layers and converges over 1000 iterations in training with a learning rate of 0.0005.Once we have numerically identified the normalized def get_LMS_kernel ( spectrum ): intensity of each color primaries as a function of wavelength, we use these 2D (intensity, wavelength) vectors to create our color perception based kernel in LMS space.For each color primary, we create the set of PSF based on our zernike polynomial generator by Here we compare outputs from five different refractive vision problems (myopia, hyperopia, hyperopic astigmatism, myopic astigmatism, and myopia with hyperopic astigmatism) for five sample input images.We provide simulated LMS space representations of target image, conventional method output, and our method.FLIP per-pixel difference along with it's mean value (lower is better), SSIM and PSNR are provided to compare performance of methods.Our method shows better loss numbers for each image quailty metrics for each experiment in simulated LMS space.The contrast improvement by using our method against conventional method also can be obvserved perceptually.Source images are from DIV2K image dataset [1].
sampling wavelengths from 400 to 700 with 1 nm intervals.During each sampling step, we create weighted kernels by multiplying the created PSF with the intensity value based on corresponding wavelength from our created 2D vectors for each colopr primary.
After creating the weighted kernel in each sampling step, we obtain LMS cone responses of weighted kernel using the same intensity and wavelength data.To compute LMS cone responses, we use the method explained in section 3.1.In the last step, the set of weighted kernels are summed up to create our color perception based kernel for each color primary forming a 4D tensor as [Color Primary, H, W, LMS Response].Our method differs from the conventional method both in terms of kernel type, and convolution operation.
In conventional method, kernel is a 3D tensor with RGB channels while in our method we use 4D tensor.In this 4D tensor formed kernel, each color primary has LMS triple seperately as [3, H, W, 3].The LMS based kernel convolves the image's each color channel with corresponding each display spectrum LMS responses.This operation computationally more expensive compared to conventional method, since more matrix operation is needed.We provide a pseudo-code for constructing our LMS based kernel as in Listings 1.

Optimization Pipeline
We implement a prescription correction optimization pipeline using a modern machine learning library with automatic differentiation [38].Source code of our implementation is publicly available at GitHub : complight/learned prescription [13] and GitHub : complight/learned prescription model [8].
Optimization loop: The differentiable input RGB image initialized from the our target RGB image, and it is passed through the forward model during optimization loop.In forward model, each color channel of initialized input RGB image convolved with the LMS kernel created in computatinal color pipeline.For example, red channel of input RGB image is convolved with L, M, S channel of red spectrum kernel in LMS space.Other color channels of input RGB image are convolved with the same method.The resulting simulated image represents the image formed on the retina from L, M, S cone activations.The target image is converted to LMS space to calculate L2 loss against the simulated image in LMS space, which is back-propagated through the optimization model to the input RGB image.Our results are obtained using Stochastic Gradient Descent with ADAM [24] as the optimizer.Our pipeline is available to be used in NVIDIA GPU accelerated computer.

Learned Model
We implement a semi-supervised deep learning model capable of reconstructing optimized images from their original RGB versions.We use a U-Net architecure [39] for this purpose.Such solution is more suitable than an iterative process for achieving real-time applications.But it trades the image quality for a faster rendering speed.Our model comprises of 2 outer layers linked to 8 convolutional hidden layers symmetrically connected by skip connections.Each layer on the contractive path of the model are formed by a double convolution and a max pooling operation.On the expanding path, an up-sampling operation with bilinear interpolation initiates each convolution.During training, batch normalization and ReLU activation are used.
Our model was evaluated on a machine with an NVIDIA GeForce RTX 2070 GPU.The training dataset comprises of 20 images of dimension 512 x 512 pixels, the RGB images were obtained from Zhang et al.'s color image processing dataset [54] and the target optimized images were generated using our iterative method.A learning rate of 1 × 10 −4 was used for the training phase and a conventional mean-squared-error loss function guides the stochastic gradient descent optimization.With convolutional kernels of size 3x3, each input image sees its channels expand from 3 to 92 and all the way up to 1472 at the latent space.The results in Figure 5 shows the comparison of the corrected image between our original pipeline and the neural network's prediction after over 800 epochs of training.The average time to generate a single corrected image is 0.0029 seconds with the model as opposed to 8.127 seconds using the original method, a tremendous decrease.

EVALUATION
We divided our experiments in to two sections.In the first part, we use real hardware to test our methods for defocus prescription.We used Oculus Quest 1 virtual reality headset, and we placed a defocus lens to create artificial prescription for a camera shot.In our experiments we use fixed pose, focus camera to capture images to demonstrate the method's performance.
Figure 4 shows our experimental setup for defocus experiments.Figure 1 shows results from the first part of our experiments.We modeled the myopia defocus, since in this way we can use defocus lenses to replicate eye prescription.Experiments shows that we improved contrast and color compared to conventional method.In fact, our method is not able to produce same quailty with the target image.
In the second part, we evaluated our method with different prescriptions to model different refractive eye problems in simulated retinal image representation.Thus, all the images used in this part are evaluated in simulated LMS space.Selected images are aimed to have both high frequency and low frequency features.Four common different prescriptions are chosen which are myopia, hyperopia, myopic astigmatism, hyperopic astigmatism to test our method against the conventional model.Also, we tested our method for myopia with hyperopic astigmatism as a complicated refractive eye problem which is not trivial for eyeglass correction.In each refractive eye problem modelling, +/-1.5Drefractive error is used to model prescriptions.Resutls are visualized in Figure 3.We use different image quality measures to compare our method agains the conventional method.Our primary chosen image quailty metric is FLIP which compares the images by using principles of human perception [5].FLIP allows per-pixel difference loss maps in magma color which is used in our evaluation images to visualize difference in each pixel against the ground truth image.In each pixel comparison FLIP counts both color and edge differences based on models of HVS.Therefore, we believe that this metric fits with our work.Although many research on this area has been used SSIM or PSNR loss, however FLIP is adventegous as it is adhering human visual system while others are not [32].In addition to FLIP, we use SSIM and PSNR to compare our method agains to conventional method to be stayed relevant with the research community.Figure 3 demonstrates the comparison of our method against the naive method with our perceptually guided color modelling.
Results shows that color opponency based kernel modelling improves the contrast of retinal output image.Selected 3 areas are mangified to show visibility of improvements in a detailed way.Fom the per-pixel difference loss maps, we found that our method is better in low frequency features while our method provides slight improvement in high frequency parts of images.In overall, it is shown that perceptually guided color based kernel has better contrast compared to conventional method.

DISCUSSION
Our method could be potentially integrated with Mandl et al. [30] to support a broader user base with refractive vision impairments.To the best of our knowledge, we provide encouraging results improving the conventional method in the literature.
Spatially Varying PSF.Our method does not account for spatially varying natures of PSF in the HVS, which often arrives with computational cost and complexity [15].We designed our implementation in constant resolution displays instead of varying resolution ones like foveated displays.As an alternative, the deep learning methods can help support spatially varying PSF convolutions in the modeling [52] with lesser computational cost but with demand in data for training.Thus, our method can benefit from these techniques in the future for precision modeling.
Chromatic Aberrations In A Human Eye.We use PSF created by the same Zernike coefficients for each wavelength in our forward model.However, optics of HVS contain chromatic aberrations that are wavelength-dependent.As a future work, we can further improve the accuracy of our modeling for a human observer by taking into account the chromatic aberrations in the HVS.In the meantime, curious readers can find greater details regarding chromatic aberrations in work by Cholewiak et al. [11].Image Quality.Approaches for prescription correction with additive displays are fundamentally limited.This limit stems from the fact that PSF, the non-negative transfer function of an additive display, could support a limited range of frequencies and cause contrast loss.Our work could be made to be complementary to holographic displays [7,10,46], which promise a unique solution for this issue originating from non-negativity in additive displays.Foveated Rendering.Foveated rendering in graphics [45] and displays [22] has garnered interest in the VR and AR research community.We believe that our method can also benefit from this trend by accounting for trends in chromatic and achromatic contrast sensitivity [14,44,47] in the HVS.Moreover, we could add a rod's response to cone responses by reformulating the LMS response to improve color difference predictions [6].We will explore this path in our future work (See Figure 6 for our early results.)

CONCLUSION
Identifying means to help display users with their vision impairments is an essential aspect of graphics systems.As we focus on this critical issue, we present a new rendering approach that provides sharp images when viewed by users with vision impairments without their prescription glasses.Specifically, our rendering approach uniquely merged key insights from HVS.It showed that it could help improve visual experiences and comfort in VR headsets by enhancing color and contrast in the displayed images.The future will likely bring more principled approaches in AR/VR displays (e.g.holographic displays), which could enable future research investigations based on findings from this work.

Figure 2 :
Figure 2: Prescription correction using a perceptually guided computational model and a differentiable optimization pipeline.(1) A screen with color primaries (RGB) displays an input image.(2) A viewer's eye images the displayed image onto the retina with a unique Point Spread Function (PSF) describing the optical aberrations of that person's eye.(3) Retinal cells convert the aberrated RGB image to a trichromat sensation, also known as Long-Medium-Short (LMS) cone perception[42].(4) Our optimization pipeline relies on the perceptually guided model described in previous steps (1-3).Thus, the optimization pipeline converts a given RGB image to LMS space at each optimization step while accounting for the PSFs of a viewer modelled using Zernike polynomials.(5) Our loss function penalizes the simulated image derived from the perceptually guided model against a target image in LMS space.Finally, our differentiable optimization pipeline identifies proper input RGB images using a Stochastic Gradient Descent solver[38].

Figure 3 :
Figure3: Here we compare outputs from five different refractive vision problems (myopia, hyperopia, hyperopic astigmatism, myopic astigmatism, and myopia with hyperopic astigmatism) for five sample input images.We provide simulated LMS space representations of target image, conventional method output, and our method.FLIP per-pixel difference along with it's mean value (lower is better), SSIM and PSNR are provided to compare performance of methods.Our method shows better loss numbers for each image quailty metrics for each experiment in simulated LMS space.The contrast improvement by using our method against conventional method also can be obvserved perceptually.Source images are from DIV2K image dataset[1].

Figure 4 :
Figure 4: Testbed used in our evaluations.(A)We use a virtual reality headset and a camera to capture images from our virtual reality headset.To emulate a prescription problem in the visual system, we use a defocus lens.(B) We take pictures with fixed pose and camera focus from behind the defocus lens to evaluate reconstructed images.

Figure 5 :
Figure 5: Results from our learned model.We compare our optimization pipeline against our learned model.The top row shows precorrected images reconstructed by our optimizer and learned model.The bottom row shows photographs for each case when captured with a defocused camera.

Figure 6 :
Figure 6: We reconstructed image in our method with addition of foveation.Foveated rendered area is in the center of reconstructed image.FLIP per-pixel difference map highlights the foveation.

Figure 1 :
Figure 1: Additional images for learned method.Images are selected from DIV2K image dataset [1].

Figure 2 :
Figure 2: Comparison of convetional method and our approach with different myopia cases.Images are simulated in LMS space.

Figure 3 :
Figure 3: Additional images for the evaluation section.Images are simulated in LMS space.

Table 1 :
Comparison of prescription correction techniques.Many of the solutions for prescription correction either fail to provide good image quality or require bulky hardware components affecting user comfort negatively.We take an algorithmic approach utilizing an accurate perception model of the human visual system, leading to improved image quality and real-time image generation.SW refers to Software while HW refers to Hardware in this table.
* This technique is refered to as the conventional method throughout the paper.