Practical color contrast sensitivity functions for luminance levels up to 10 000 cd/m 2

We model color contrast sensitivity for Gabor patches as a function of spatial frequency, luminance and chromacity of the background, modulation direction in the color space and stimulus size. To ﬁt the model parameters, we combine the data from ﬁve independent datasets, which let us make predictions for background luminance levels between 0.0002cd/m 2 and 10000cd/m 2 , and for spatial frequencies between 0.06cpd and 32cpd. The data are well-explained by two models: a model that encodes cone contrast and a model that encodes postrecep-toral, opponent-color contrast. Our intention is to create practical models, which can well explain the detection performance for natural viewing in a wide range of conditions. As our models are ﬁtted to the data spanning very large range of luminance, they can ﬁnd applications in modeling visual performance for high dynamic range and augmented reality displays.


Introduction
Contrast sensitivity functions (CSFs) model how well a human observer can detect a simple visual stimulus, such as a Gabor patch of certain spatial frequency, background color and luminance, size and the modulation direction in a color space. Such CSFs model the detection performance of visual system for low contrast, barely noticeable stimuli, shown on uniform backgrounds. CSFs are a critical component of many visual models, such as those of modeling visibility (VDP [1], HDR-VDP [2]) or color differences (sCIELab [3]).
The contrast sensitivity for achromatic (luminance-only) stimuli have been well-studied, with a large number of measurements, described by general models [4,1]. However, little work has been done to characterize contrast sensitivity for contrast modulation in arbitrary chromatic directions for large variations of background luminance, ranging from low scotopic to bright photopic levels.
Here we model data from five independent datasets using mean luminances from 0.002 to 10000 cd/m 2 . The focus on the large range of luminance is motivated by applications in high dynamic range imaging and augmented reality displays, where the luminance can easily exceed 1000 cd/m 2 . Using the combined dataset, we fit two models, the cone contrast model and the postreceptoral model, which differ in how they encode chromatic contrast. The goal is to provide a practical model that can predict contrast detection in real-life applications. We provide the parameters and Matlab code for both models 1 .

Related Work
In comparison to the extensive literature on achromatic CSFs [4], much less is known about chromatic CSFs. Chromatic CSFs are known to be low-pass as a function of spatial frequency [5,6], meaning that chromatic variations at low spatial frequencies are easier to detect than those at higher spatial frequencies.
Similarly, slow changes in color are easier to detect than rapid changes [7].
Much less is known about how chromatic CSFs vary with other stimulus parameters, such as luminance and stimulus size. Early work showed that chromatic variations are easier to detect at higher luminance [6], a finding that was confirmed and extended to 10,000 cd/m 2 and for spatial frequencies between 0.5 and 6 cpd [8]. A review of other measurements of chromatic contrast sensitivity can be found in Table 1 in [9].
Chromatic contrast sensitivity was modelled as low-pass Gaussian functions in the sCIELab color-difference metric [3]. Such a simplified model accounted only for spatial-frequency. Lucassen et al. [9] fitted two models to measurements for three colored backgrounds selected along black-body locus and of the same luminance. They concluded that the model encoding postreceptoral contrast could better explain the data than the model encoding cone-contrast. In our work we test similar models on a larger set of data, including multiple luminance levels and stimulus sizes. We find that the the postreceptoral model, as proposed in [9], poorly explains our combined dataset and, only when modified to encode luminance-normalized contrast, it performs on par with the cone-contrast model. Our work extends the model of Wuerger et al. [8], who characterized the achromatic and chromatic CSFs along red-green and yellow-violet opponent color directions for arbitrary luminance and stimulus size, but did not generalize the models to predict the detection for an arbitrary direction of chromatic modulation. The model is also much more comprehensive, being fitted to an extensive dataset created by combining data from 5 publications, including the work of Wuerger and colleagues.

Detection Stimuli
Contrast threshold is the minimum contrast required to reliably see a target item against the background. A convenient measure of contrast of chromatic stimuli is cone contrast, which provides a device-independent definition of color contrast: where L 0 , M 0 , and S 0 are the cone responses for the background, and ∆L, ∆M and ∆S are amplitudes of the chromatic modulation. The LMS color space is typically constructed so that the sum of L and M cone responses corresponds to luminance, that is Y = L + M. Given that, cone contrast is equivalent to Michelson contrast for an achromatic stimulus. Contrast sensitivity is another measure of how well human observers detect contrast, and is defined as the inverse of the contrast threshold, The detection target is usually a pattern created by multiplying a 2D sinusoid with a Gaussian envelope, and is called a Gabor. Gabors are characterized by the spatial frequency, phase, and orientation of the sinusoid, as well as the width of the Gaussian envelope. The amplitude of the sinusoid in the color space corresponds to incremental cone responses ∆L, ∆M and ∆S.
In the present models, we ignore phase and orientation, focusing on the effects of spatial frequency (ρ), the standard deviation of the Gaussian envelope (σ ) and background color (L 0 , M 0 , S 0 ) on color contrast detection. A higher value of σ gives a larger stimulus that shows more sinusoidal cycles, resulting in a Gabor that is easier to detect. We define fixed-cycles stimuli as those whose σ is inversely proportional to frequency ρ, showing the same number of sinusoidal cycles (Fig. 1A). Fixedcycles stimuli are thus normalized for the number of visible cycles. Typically, however, contrast detection experiments are conducted with fixed-size stimuli, whose σ are fixed to occupy the same visual area (Fig. 1B). We parameterize our models by the size of a Gabor, which we define as a = πσ 2 .
Contrast sensitivity is typically measured in a contrast detection experiment. In such an experiment, on each trial participants are presented with n regions of equal area (simultaneous presentation) or with n multiple intervals (sequential presentation). The stimulus is present in one of the n regions or intervals. The task of the human participant is to indicate which region or interval contains the target stimulus. Such a paradigm is called an n-alternative-forced-choice, or n-AFC paradigm (e.g., 2-AFC for two choices). For four of the five datasets that we used, the data were collected in an n-AFC detection paradigm.

Datasets
We fitted the model on contrast detection thresholds from five different studies, listed in Table 1. The datasets are complementary in many respects: some provide data over a large range of luminance ( [2,12,8]), whereas others provide multiple directions of chromatic modulation [11,14] or background chromaticities [14]. These datasets were selected because they all contain data for natural viewing conditions: binocular viewing, non-dilated pupil, no correction for chromatic aberrations. Such data collected for natural viewing are more relevant for practical applications than the data found in some earlier studies, which employed monocular viewing and optical corrections [5]. Those datasets also do not include corrections for individual variations in isoluminance and aggregate measurements from 3-20 observers for the same chromatic modulation directions.
To unify color specifications across the datasets, we converted the background colors and color directions to LMS cone responses, calculated using the CIE 2006 cone sensitivity functions [15]. To do so, we either used the measured color spectra of the monitor used in each experiment, or if the spectras were not available, we assumed the spectra of a standard CRT or LCD monitor.

Chromatic Detection Models
We present two models, the cone contrast model and the postreceptoral model [9], which are inspired by the physiology of the early visual system.
Many aspects of the early color vision system are well understood [7]. The first site of color processing consists of the photoreceptors, the L, M, and S-cones, whose responses are positive real values. The second, postreceptoral site of color processing converts the cone responses into values in an opponent colorspace, called the DKL colorspace [10]. The difference between the cone contrast model and the postreceptoral model is in where and how the model normalizes to the background luminance and color to compute color contrast. In the cone contrast model, the normalization with the cone responses occurs prior to converting into the DKL space. In the postreceptoral model, the normalization occurs after the conversion.

Cone Contrast Model
In the cone contrast model, shown in Fig. 3a, the incremental responses of L, M, and S-cones are encoded as contrast ( ∆L /L 0 , ∆M /M 0 , ∆S /S 0 ), then combined to form achromatic (∆C A ) and two chromatic (∆C B , ∆C R ) mechanism responses: with exponent β . This is equivalent to probability summation with a psychometric function defined as the cumulative Weibull distribution function with a slope of β (Eq. 4). The model is calibrated so that the model response E is as close to 1 as possible when the Gabor patch is at the detection threshold. The probability of detection P det can be computed as: The constant ln(0.5) ensures that P det = 0.5 when E = 1. s A (·), s R (·), s B (·) are the sensitivity functions of spatial frequency ρ (in cycles per degree), stimulus size a (in deg 2 ) and the background luminance Y (in cd/m 2 ). We assume that the background luminance is Y = L 0 + M 0 . We model each sensitivity function as a product of inverse log-parabola and a modified stimulus size term, originally proposed by Rovamo et al. [16]: where c represents the mechanism (A, R or B). The modification of original formula includes the exponent γ c , which improved the fit. The function k c (a) is given by: whereâ c andf 0 are the model parameters.f 0 was fixed to 0.65, following [16].
The log parabola function is: The log parabola is truncated (low-pass) for chromatic channels (R and B) and band-pass for the achromatic channel (A). b c represents the bandwidth of the parabola. Some parameters of the model vary with luminance. This includes the base-sensitivity: and the peak-frequency for the achromatic channel: The remaining constants are the parameters of the model.

Postreceptoral Contrast Model
The postreceptoral model, shown in Fig. 3b, shares most of the computation with the cone contrast model, except that the contrast is encoded after computing opponent-color responses: and the responses are normalized by the background luminance: We also considered normalizing mechanism increment responses by the response of the opponent color mechanism (∆C A = ∆C A/CA ), as proposed in [9], or the mixture of both (∆C A = ∆C A/α C A +(1−α)Y ) as proposed in [17], but the model fits were much worse. The remaining steps of the postreceptoral contrast model are identical to those of the cone contrast model and follow equations 3-9.

Sensitivity Predictions
The presented models allow us to compute the responses (eq. 3), or probability of detection (eq. 4). However, to match psychophysical data, we need to predict contrast sensitivity. Fortunately, model responses (E in eq. 3) are linearly related to the incremental cone responses (∆L, ∆M, ∆S), which allow us to calculate predicted sensitivity. Let us introduce a scaling factor t and new cone increments: We want to find the value of t for which the model response indicates the detection threshold. After introducing the new increments to eq. 2 or eq. 10, we can show that the scaling factor t can be factored in front of the energy term: The detection threshold is found when E = 1, therefore t = 1 /E, where E is the original energy from eq. 3. Then, the sensitivity is given as the inverse of cone contrast for the new increments:

Model Fitting
The datasets were collected using different tasks (e.g., 2-AFC vs. 4-AFC, detection vs. discrimination between horizontal and vertical Gabors) and different participant samples. We assume that such methodological differences lead to differences in overall sensitivity, which we compensate by introducing factors f d that scale the sensitivity of each dataset. We want such scaling factors to be close to 1 to avoid degenerate fits. Therefore, the loss function is: whereS i,d is the predicted sensitivity for stimulus i in the dataset d, and N is the total number of stimuli in all datasets. The constant α controls the influence of the regularization term (α = 0.005) and D is the number of datasets. We fix the scale factor for the first dataset to be 1 ( f 1 = 1). We report fitting error in intuitive units of decibels. Those are computed by multiplying the square root of the first (data-loss) term of eq. 15 by 20. Due to the scarcity of the data, we cannot partition the data into training and testing sets. Random partitioning is likely to leave important degrees of freedom of the model unconstrained, and any structured partitioning approach is unlikely to provide a robust test for overfitting. We plan to collect a testing dataset in the future.

Results
The two models resulted in equally good fits, with no clear winner. The fitting errors suggest that the cone contrast model is marginally better at predicting the thresholds ( Table 2). However, the difference is small (0.13 dB) and observed in only two datasets. Part of the reason is that experimentally separating the predictions of the cone contrast and the postreceptoral contrast is quite difficult, and even our combined dataset is insufficient to do so. Indeed, whether the cone contrast model or the postreceptoral model is more accurate remains an open debate in vision science.
Further insights can be found by inspecting model predictions and data, shown in Fig. 4-8. In all plots, circles are used to denote the data points, continuous lines represent predictions for the cone contrast model and dashed lines for the postreceptoral contrast model.
Despite the different structure of both models, the predictions are very similar. The largest differences occur for the lower frequencies (Figs. 4, 5 and 8). Both models predict an asymmetric drop in sensitivity for achromatic stimuli in log-log space, with a shallower drop-off at low frequencies. An ablation study of both models (not shown) indicated that this effect is caused by one of the chromatic mechanisms (R or B) detecting achromatic contrast modulations. Since the chromatic tuning of the opponent-color responses (Eq. 2 and 10) is optimised, we allow some achromatic signal to leak into chromatic response and vice versa. The leaking causes chromatic mechanisms to detect low-frequency modulations, for which chromatic mechanisms are more sensitive than the achromatic mechanism.
When comparing figs. 6 and 8, it can be observed that Green-Red and YellowGreen-Violet chromatic modulations produce a band-pass sensitivity shape for Kim et al. 2013 dataset, but low-pass shape for Wuerger et al. 2020 dataset. This is because stimuli in Kim et al. 2013 dataset maintained the same size across frequencies, while the stimuli in Wuerger et al. 2020 varied in size with frequencies (the stimuli maintained the same number of cycles). Therefore, it is essential to model the effect of size when explaining the data coming from different datasets.
It is worth noting that the data in Fig. 8 shows a drop in sensitivity for achromatic contrast above 200 cd/m 2 . Most models and measurements predict constant sensitivity at high photopic luminance levels [4]. Our models predict a loss of sensitivity at high luminance, which is relevant for applications in high dynamic range imaging.
The detection ellipses shown in Fig. 7 can be reasonably well predicted by both models. It is worth noting that the chromatic tuning of the opponent-color responses assumed in both models (Eq. 2 and 10) is different for the cone-contrast and   Table 2: Model fitting errors.
The fitted parameters (Table 3) show that the estimated mechanism matrices (M LMS→ARB ) resemble the mechanisms derived from neurophysiology: L + M, L − M, S − (L + M). Lucassen et al. [9] also estimated the chromatic tuning of the mechanisms for both a cone contrast and a postreceptoral contrast model, but their best-fitting matrices did not correspond to known opponent-color mechanisms, likely because they did not constrain their matrices and their dataset did not span a large range of stimulus parameters.

Conclusions
We propose two contrast sensitivity models that can account for a wide range of contrast detection thresholds, obtained for different mean luminance levels, spatial frequencies, stimulus size and directions in color space. Both models can predict data from five different datasets, confirming the ability of the models to generalize across a range of conditions. We want to emphasize in this work the need to consolidate existing datasets to create robust visual models.
We hope that modeling datasets obtained under a wide range of experimental conditions can help us better understand lowlevel color vision. For example, a simple cone contrast model can account for detection thresholds obtained under steady-state adaptation to chromatic backgrounds as well as for backgrounds encompassing a wide range of luminance variations. If a postreceptoral model is adopted, the only relevant postreceptoral adap-  tation is captured by adaptation to the mean luminance (Eq. 11) without any need to postulate an additional chromatic adaptation factor. However, more data need to be collected with colored backgrounds that allow us to discriminate between these two models. We hope that the proposed models can be used for constructing detection/discrimination models for more complex stimuli, as banding artifacts [17] or differences between color patches. We also need to acknowledge several limitations of the proposed models. At the very low mesopic light levels, it is likely that rods contributed to the detection performance. Our data are too limited to model rod intrusion. The models are intended to be functional and may not account for some aspects of low-level vision.