Perceptually motivated model for predicting banding artefacts in high-dynamic range images

Banding is a type of quantisation artefact that appears when a low-texture region of an image is coded with insufficient bitdepth. Banding artefacts are well-studied for standard dynamic range (SDR), but are not well-understood for high dynamic range (HDR). To address this issue, we conducted a psychophysical experiment to characterise how well human observers see banding artefacts across a wide range of luminances (0.1 cd/m2– 10,000 cd/m2). The stimuli were gradients modulated along three colour directions: black-white, red-green, and yellow-violet. The visibility threshold for banding artefacts was the highest at 0.1 cd/m2, decreased with increasing luminance up to 100 cd/m2, then remained at the same level up to 10,000 cd/m2. We used the results to develop and validate a model of banding artefact detection. The model relies on the contrast sensitivity function (CSF) of the visual system, and hence, predicts the visibility of banding artefacts in a perceptually accurate way.


Introduction
Digitally representing colour requires the conversion of continuous values into discrete ones. If an insufficient number of bits are used to represent the digital value, human observers may see edges in the quantised image ( Fig. 1), called banding or contouring artefacts. Such artefacts occur in low texture regions of the image where the pixel values vary smoothly, as in the sky or the ocean. Banding artefacts are aggravated by edgeenhancing and contrast-normalizing mechanisms in the early human visual system, which amplify the perceived brightness and colour difference at the banding edges and lead to illusions such as the Chevreul illusion [1].
However, there is a trade-off: minimising the appearance of banding artefacts requires encoding with higher bit-depth, but higher bit-depth necessarily results in more data. This is undesirable, as there is an immense amount of visual content that is created, stored, and streamed daily; indeed, high-definition streaming is only possible with lossy video coding [2]. On the other hand, encoding with insufficient bit-depth results in unattractive banding, and for some applications, such as medical imaging, illusory bands may even result in incorrect diagnosis [3].
Thus, it is useful to develop a model that accurately describes how banding artefacts are detected by a human observer, in order to better understand the trade-off between bit-depth and image quality. However, existing models have been designed for standard dynamic range (SDR) displays, with a typical peak luminance of 200 cd/m 2 . With increasing adoption of high dynamic range (HDR) displays, it is important to develop a model that encompasses the wide range of luminance levels that can be shown in HDR, from mesopic (e.g., highway at night) to high photopic (e.g., sunny day).
In this paper, we develop and validate a perceptually motivated model of banding artefact visibility that spans a wide range of luminance levels. Our model improves upon Denes  Quantized Figure 1: Sample stimuli. The stimuli were 2D gradients modulated along the three axes of the DKL colour-opponency space [6]. Values in DKL are linear combinations of L-, M-, and Scone responses.
[4] Fourier-based, analytic formulation of banding artefacts, incorporating a contrast sensitivity function (CSF) for stimuli at luminance levels between 0.0002 cd/m 2 to 10000 cd/m 2 [5]. Our model also operates on physical units of luminance and contrast, rather than relative pixel values, and is therefore deviceand content-independent.

Related Work
A Google Patents search shows that hundreds of patents have been filed for debanding or decontouring algorithms for printed media and digital displays. However, few studies have looked at building a model of banding detection, and even fewer have developed perceptually motivated ones. Below, we review some models of banding that rely on some intuition about what may be perceptually important.

Banding Detection in SDR
The majority of banding detection algorithms [7,2,8,9] rely on finding groups of pixels that share the same pixel value and identifying the boundaries between the grouped regions. If RGB values of the grouped regions only have a small difference, the pixel groups may represent quantised zones of a gradient, meaning that edge between the groups is a candidate banding artefact. Then, it is a matter of determining whether the candidate artefact is likely to be visible to the human observer.
Some of those models use banding artefact size as a measure of visibility. Bhagavathy et al.'s [7] multi-scale method is an early example that incorporated the role of size by checking for candidate artefacts at multiple neighbourhood sizes. Baugh et al.'s [2] method indirectly incorporates the role of size by finding a distribution of pixel groups, since heavily quantised images have an uneven distribution of groups due to large swaths of homogeneous regions. Baugh et al. also propose the Banding Index, where BI < 0.9 is reported to be a reasonably good threshold for identifying badly quantised images or frames. This method works with H.264/AVC video coding [10], and is thus appropriate for detecting artefacts in HD streamed video. Wang et al.'s [9] method emphasizes the length of the candidate banding artefact. In addition, this method considers the coherence of the candidate artefact. If the candidate artefact is sharp and clean (coherent), then it is more likely to be visible. Notably, Wang et al. validated their method with a subjective study, measuring mean opinion scores (MOS).
Other methods depend on edge detection. Tu et al. [11] proposed BBAND (Blind Banding Artefact Detector). This method does not rely on an initial pixel grouping step, but rather, edge detection with the Sobel filter. Then, the algorithm computes the visibility of that edge. The edge is less likely to be visible when surrounded by high-luminance regions and high-texture regions. The edge is more likely to be visible when it is long. The authors also validated their method using the MOS dataset from Wang et al. [9]. In another work, Lee et al. [8] first detect non-smooth regions by reducing the bit-depth. Edges are then detected using a directional contrast feature measuring how much the intensity of a pixel differs from its 8 neighboring pixels in four horizontal, vertical, diagonal and anti-diagonal directions. A content-based empirical threshold is used to categorize edges as natural (larger than the threshold) or visible banding artefacts (smaller than the threshold).
While the above studies have incorporated some intuitive knowledge of what contributes to banding artefacts, very few studies have incorporated a model of the early visual system. Daly and Feng [12] is an exception, relying on the spatiotemporal characteristics of the CSF. An important contribution is the Fourier analysis of the banding artefact by treating it as the error between the quantised and the continuous images. In particular, the authors note that a key determinant of banding visibility is the fundamental frequency of the error signal. Denes et al. [4] follow Daly and Feng's analysis, extending the work to chromatic components. Importantly, Denes et al. approximate the error signal with a saw-tooth function. The Fourier transform of the saw-tooth function has a simple closed-form solution and can be rapidly evaluated, making the method appropriate for the authors' intended application to Virtual Reality (VR).

Banding Detection in HDR
Compared to SDR, banding detection in HDR images and videos remains less investigated. In a series of psychophysical experiments, Boitard et al. [13] identified the minimum bit-depth that is required per colour component to represent HDR colour pixels without introducing any banding artefacts. However, a detection model was not provided and the maximum luminance level evaluated was 50 cd/m 2 .
In another study, Song et al. [14] address the banding artefacts that are visible in HDR frames generated by inverse tone mapping SDR video frames. The frames are compressed using legacy video encoders such as HEVC [15]. Banding artefacts are detected using residual banding level ratios, which are the ratios between the highest quantisation step before and after filtering a picture region. The Mean Squared Error (MSE) between 12-bit inverse tone mapped and reference HDR video frames is also used to detect banding artefacts as the goal of the method is to make quantisation steps of an inverse tone mapped 12-bit HDR video frames similar to those of the reference 12-bit HDR video frames. The MSE ensures that only banding artefacts are smoothed and that edges in the original image are preserved. The banding detection metric in [14] requires a quantised HDR signal for which quantisation steps do not yield visible banding.
Su et al. [16] proposed a banding detection metric referred to as False Contouring Detection (FCD) which returns the num-

Continuous Position
Position Position

Error Quantized
Contrast 0 Min Max Figure 2: We model banding detection as a question of detecting the error signal [12,4]. The error is well-approximated by a sawtooth function. The fundamental frequency carries most of the energy of the Fourier spectrum, and therefore is the key determinant of artefact visibility. For error signals of the same amplitude, a shallow gradient results in a lower fundamental frequency than a steeper one.
ber of potential visible banding artefacts in a picture or picture area. The metric detects the contouring edges using median filtering and then evaluates each edge as visible if their contrast is higher than its visible contrast threshold calculated using a CSF, and as invisible otherwise. For each visible contouring edge that is detected in a picture the FCD is incremented by one in value.
Azimi et al. [17] treated visible banding artefacts in HDR colour pixels as visible colour differences between quantised HDR pixels using 10-bit per colour channel and continuous reference pixels. Such visible colour changes were measured using CIE DE2000 [18] colour difference metric. It was shown in [17] that while colours that are closer to the white point of the Rec.2020 gamut require more code-words than 1024 (10-bit range) to represent colours without visible changes, colours at the border of the gamut (more saturated colours) can be represented with fewer bits.

Our Work
We extend Daly and Feng [12] and Denes et al. [4] to HDR, using Fourier analysis to predict the visibility of the error signal. For a colour image I(x, y, c) and its quantised counterpart I q (x, y, c), the quantisation error is where x and y are the location in the image and c is the colour channel (Fig. 2). E(x, y, c) is the colour difference, or the contrast, between I(x, y, c) and I q (x, y, c). Banding detection can be framed as a question of the sensitivity to this error. In particular, it can be framed as a question of whether E(x, y, c) exceeds the contrast detection threshold of the human observer. By definition, error signal below the contrast threshold is unlikely to be detected can be safely ignored.
Contrast thresholds are a function of colour [19], luminance [20], and spatial frequency [21]. This dependence is modelled by the CSF, which describes the inverse of the amplitude needed for threshold detection; it shows the sensitivity of the human visual system to a particular spatial frequency at the given colour and luminance. Having an accurate CSF is therefore important for reliably modelling banding detection. Here, we use a CSF capable of predicting contrast thresholds between 0.0002 cd/m 2 and 10,000 cd/m 2 , 0.125 and 32 cycles per degree (cpd), and for any arbitrary colour direction [5], which was made possible by combining multiple datasets [22,23,24,25,26].

Experiment
We conducted an experiment to investigate banding detection thresholds across a wide range of luminances, from mesopic (0.1 cd/m 2 ) to high photopic (10,000 cd/m 2 ), for banding artefacts modulated along three opponent-colour directions: achromatic, red-green, and yellow-violet.

Methods Apparatus
The experiment was conducted on a custom-built HDR display with a peak luminance of 37,000 cd/m 2 . The display consisted of an LCD panel extracted from an iPad 3/4 retina display (9.7", 2048 × 1536 px; product code: LG LP097QX1) and a DLP projector (Optoma X600,1024 × 768 px). The display had a maximum contrast ratio of 1,000,000:1 and an effective resolution of 2048 × 1536 px. Each channel and each display could reproduce 10 bits: 8 bits via display and 2 additional bits via spatio-temporal dithering. More details on the display can be found in [24].
The viewing distance was 91 cm, such that the display occupied the central 12.4 • × 9.3 • of the visual field, with an angular resolution of 165 ppd (pixels per visual degree). The experiment room was completely dark, eliminating direct or reflected source of light falling on the screen. The room conditions were compliant with the recommendations in BT.500.

Stimuli
The stimuli were coloured 2D gradients (Figure 1) defined in the Derrington-Krauskopf-Lennie (DKL) colour-opponency space [6] with D65 white point. The DKL colour space is linear transformation of the LMS colour space, putting the origin at the white point and modulating along achromatic, red-green, and yellow-violet directions: where L D65 , M D65 , S D65 , were D65 white point in LMS coordinates using CIE 2006 cone fundamentals [27]. ∆L, ∆M, ∆S were the gradient modulations in LMS space, and ∆DKL = [∆A, ∆R, ∆V ] were the modulations in DKL space, corresponding to modulations along the achromatic, red-green, and yellowviolet directions, respectively. Using the DKL colour space allowed us to define the gradients in a device-independent, physiologically accurate opponent colour space. The relative gradient was defined as: where x and y were the pixel coordinates, and c was one of the three colour components (achromatic, red-green, or yellowviolet) selected by c sel , the colour component for which the gradient was generated. r was the angular display resolution in pixels per degree, and l was the width and height the the stimulus in degrees. In our experiment, we used r = 165px/ • , l = 4.5 • for all stimuli. We used s = 0.3556, 0.0889, and 0.4444 for achromatic, red-green and yellow-violet components, respectively.
Then, the gradient was added to D65 background of the desired luminance Y : where W D65 = [1, 0, 0 ] was the chromaticity of D65 white point in DKL space. The gradient was quantised directly in the DKL colour space: where t was the quantisation step.

Observers
Four observers (1 female, 3 males; mean age = 35.25) from the University of Cambridge participated in the experiment. All observers had optically corrected 20/20 vision. All had normal colour vision, tested using Ishihara colour plates. Two of the observers were authors; the others were naïve to the experiment procedure. All observers were familiar with the concept of quantisation and banding artefacts.

Procedure
The experiment consisted of three colour directions (achromatic, red-green, yellow-violet) presented at six luminances (0.1, 1, 10, 100, 1000, and 10,000 cd/m 2 ), for a total of 12 conditions. Pilot experiments did not reveal influence of condition order; thus, we presented the conditions in increasing order of luminance, in order to spare the time required for dark adaptation between conditions. Within a luminance level, the trials for different colour directions were presented in randomly interleaved order.
Each condition consisted of 25 to 35 4-alternative forced choice (4AFC) trials. In each trial, observers saw four randomly oriented gradients in a 2 ×2 arrangement. Three of the four gradients were displayed without quantisation; the fourth was quantised. The task was to identify the quantised gradient. The stimuli remained visible on the display until observer made a response.
We used QUEST, an active sampling procedure for psychophysical experiments [28], to sample the quantisation levels. A psychometric function was fitted to the 4AFC responses to estimate the detection threshold. Each observer completed the experiment in 1.5 hours. Fig. 3 shows the results. The detection threshold for banding was s a function of both luminance and colour direction of the gradient. A lower threshold means that the banding was harder to see. On average, the detection threshold was the lowest for redgreen, followed by achromatic, and the highest for yellow-violet. This is consistent with what we know about the CSF: red-green contrast sensitivity is much higher than the achromatic, which is in turn more sensitive than yellow-violet.

Results
In addition, there was an effect of stimulus luminance. At mesopic to low photopic levels (0.1-10 cd/m 2 ), detection thresholds decreased as a function of increasing luminance. However, at medium to high photopic levels (≥10 cd/m 2 ), the detection thresholds stayed constant as a function of luminance. This is interesting, because Wuerger et al. [24] found that the achromatic CSF has a noticeable U-shape as a function of luminance, with the threshold at 10,000 cd/m 2 , being much higher than at 100 cd/m 2 . However, CSFs are defined for detecting wavelet-like stimuli, which consist of a single spatial frequency.
Thus, our results suggest that the CSF alone is insufficient for predicting detection in images that contain multiple spatial frequencies, as in banding artefacts. In comparison, red-green and yellow-violet CSFs saturate with increasing luminance, which is qualitatively consistent with the detection thresholds for banding.
Using these results, we developed a model of banding detection that decomposes the error signal into its frequency components while also handling a wide range of luminances. Our findings significantly extend Denes et al.'s work [4], as they only tested luminance up to middle photopic levels (22 cd/m 2 ).

Modelling
Our model imitates the physiological process of detection by simulating opponent colour channels and multiple spatial frequency channels. First, the model transforms the gradient into DKL opponent colour space. Then, in each colour channel, the model decomposes the quantisation error E(x, y, c) into its spatial frequency components. Rather than numerically transform E(x, y, c) into the Fourier domain, we follow Denes et al. [4] and represent E(x, y, c) as a saw-tooth function (Fig. 2), whose Fourier transform has an analytical solution. For a single line y and a colour channel c, the Fourier transform of the quantisation error is where w is the width, or the period, of the saw-tooth (visual degrees) and h is the height of each saw-tooth step. h is determined by the quantization step t, See also Eq. 5. Therefore, the amplitude of the kth frequency component is and its frequency (in cycles per degree) is where w is the period in pixels. We found that the accuracy of the prediction does not improve beyond the first five Fourier components. For a given slope s of the gradient, the period of the sawtooth can be computed as It should be noted that the slope s changes across the 2D gradient stimulus (Fig. 1) and we do not know at what slope triggers banding detection in human observers. This issue is discussed later.
The model then uses the CSF [5] to compute the probability of detecting the error signal at that frequency and colour channel.
where CSF(·) returns the inverse of the detection threshold, 1 − exp(·) is the psychometric function for converting contrast thresholds into probability of detection, and ln(0.5) sets the contrast threshold at P = 0.5 at the detection threshold.

Minimum Detectable Quantisation
Step Error P c,k P Figure 4: Banding detection model. The error signal is transformed into the Fourier domain resulting in the spatial frequencies (ω c,k ) and amplitudes (α c,k ) of the banding artefacts. We find the detection threshold using a CSF, which we convert into detection probability.
To find the overall probability of detection, we combine the probability of detection per channel, where k max = 5 in our implementation. To find the banding detection threshold, we run a binary search on CSF(·) simultaneously across all colour and spatial frequency channels to find the quantisation step t that yields P = 0.5.

Results
In Fig. 5, we plot the model predictions with the data. The data are plotted in green, as are the model predictions that assume the same slopes that we used in the experiment. The model qualitatively reproduces the detection thresholds: the model predicts that banding artifacts in the red-green direction are more visible than in the achromatic direction, which is in turn more visible than yellow-violet. Within each colour direction, the model also reproduces the qualitative behaviour of thresholds decreasing between 0.1 cd/m 2 and 10 cd/m 2 , then staying about the same from 10 cd/m 2 to 10000 cd/m 2 .
However, the data are consistently below the model predictions for the same slope, meaning that human observers are able to see banding artefacts better than predicted by the model. For the achromatic direction, the data align well with the model prediction for s = 0.1778. This is interesting: although the gradients in our experiment were defined to have a slope of s = 0.3556 on one edge, the slope becomes gradually shallower towards the other edge and reaching zero (Fig. 1), such that the slope was s = 0.1778 near the centre of the stimulus. Indeed, for quantisation errors of the same amplitude, shallower gradient slopes result in a lower fundamental frequency (Fig. 2). As the visual system is better at detecting lower frequencies than higher frequencies, this means that quantisation errors are easier to detect for shallower gradients. Similarly, the red-green and yellow-violet predictions are also more consistent with the data when we assume a shallower slope. However, whereas the red-green data are well-captured by assuming a shallower slope (s = 0.0222), the predicted detection threshold for yellow-violet is higher than the data at lower luminances.

Conclusions
Banding artefacts pose a greater problem in HDR than SDR, due to the wider luminance range. We investigated banding detection for a wide range of luminance (mesopic to high photopic), and for three opponent colour directions (achromatic, red-green, yellow-violet). This allowed us to develop a perceptually motivated model of banding detection for HDR images and for any arbitrary colour direction. While Wang et al. [9], Tu et al. [11], and Su et al. [16], also developed perceptually motivated models, ours has the advantage that it deals specifically with luminance coding, operating on physical units of luminance and contrast, rather than relative pixel values. Denes et al.'s [4] model also operates on physical units, but our model has the advantage of using a more recent, more perceptually accurate CSF capable of predicting banding detection in a much larger dynamic range, and therefore, is more reliable for HDR colour changes. While more work remains to be done, our model provides a first step to a more rigorous approach to banding detection in HDR. The data (green dots) are qualitatively consistent when we use the model prediction is based on the same gradient slope s that we used in the experiment (green lines). However, the data are even better predicted when we assume shallower slopes (other coloured lines). Bottom row: Visualisation of slopes depicted in top and middle rows.