Revisit Retinex Theory: Towards a Lightness-Aware Restorer for Underexposed Images

We investigate how to correct exposure of underexposed images. (e bottleneck of previous methods mainly lies in their naturalness and robustness when dealing with images with various exposure levels. When facing well-exposed or extremely underexposed images, they may produce overor underenhanced outputs. In this paper, we propose a novel retinex-based approach, namely, LiAR (short for lightness-aware restorer). (e word “lightness-aware” refers to that the estimated illumination not only is a component to be adjusted but also serves as a measure that reflects the brightness of the scene, determining the degree of adjustment. In this way, underexposed images can be restored adaptively according to their own brightness. Given an image, LiAR first estimates its illumination map using a specially designed loss function which can ensure the result’s color consistency and texture richness. (en adaptive correction is performed to get properly exposed output. LiAR is based on internal optimization of the single test image and does not need any prior training, implying that it can adapt itself to different settings per image. Additionally, LiAR can be easily extended to the video case due to its simplicity and stability. Experiments demonstrate that facing images/videos with various exposure levels, LiAR can achieve robust and real-time correction with high contrast and naturalness. (e relevant code and collected data are publicly available at https://cslinzhang.github.io/LiAR-Homepage/.


Introduction
Poor lighting conditions can cause serious quality degradation of captured images and videos. For example, images taken under low-light conditions look dark overall, and back-lighting tends to cause illegible surface details in backlit region. Although the restoration of underexposed images has been a long-standing problem with a great progress made over the past decade, developing a practical effective restorer remains a challenge.
Various research studies have been done for exposure correction of underexposed images and one of the most widely used paradigms is retinex theory [1], which assumes that the sensations of color have a strong correlation with reflectance and illumination. Each color area is composed of red, green, and blue primary colors of a given wavelength, and these three primary colors determine the color of each unit area. Specifically, according to retinex theory, an image I can be decomposed into pixelwise product of reflectance R and illumination S as Unlike supervised learning-based schemes, in this paper, we introduce a "zero-shot" scheme to fulfill the task of illumination map estimation. By "zero-shot," we mean that our approach does not need any prior image examples or prior training. In our scheme, the illumination map of the given image is obtained by iteratively minimizing a specially designed loss function. Such a loss comprises two terms and they are devised to ensure color consistency and texture richness of the restored result, respectively. e illumination maps estimated in this way can endow the restored results with finer details and more natural appearances.
With the illumination map at hand, in some pipelines, it is simply removed to restore the reflectance map [6,7]; in some other pipelines, it is adjusted further with some fixed predefined rules [8,9]. It is easy to see that neither of the aforementioned two ways of processing illumination maps takes the brightness of the input image into full consideration. Consequently, when encountering well-exposed inputs, these methods may produce overenhanced results while for extremely underexposed inputs, their outputs are inclined to be underenhanced. In this paper, we explicitly model the impact brought by the image's brightness and propose a simple yet effective strategy, relying on the mean brightness of the illumination map, to modify the estimated illumination map. is strategy can adaptively stretch the contrast of both bright and dark images. In this sense, we claim that our pipeline for underexposed image restoration is "lightness-aware." e contribution of this work is summarized as follows: (i) A lightness-aware restorer for underexposed images, namely, LiAR (short for lightness-aware restorer), is proposed. Its efficiency and efficacy have been quantitatively and qualitatively validated by experiments (refer to Section 4 for details).
(ii) LiAR does not require prior training; instead, it depends on internal optimization of the single input image. Hence, LiAR has a preeminent generalization capability and can be widely applicable to various shooting scenes and kinds of illumination conditions.
(iii) In LiAR, to optimize the illumination map of the input image, a novel loss is proposed. Such a loss can guarantee that the restored result has color consistency with the input and that it has rich texture details.
(iv) To modify the estimated illumination map adaptively to the input image's lightness, a strategy incorporating the mean of the illumination map is proposed and used in LiAR. is strategy allows the restored output to have appropriate brightness regardless of whether the input image is bright or dark.
(v) LiAR can be efficiently implemented with GPU. In addition, it has excellent scalability and adaptability. Hence, it can be easily extended to enhance underexposed videos. It is worth mentioning that because of LiAR's property of lightness awareness, compared with the outputs of other commonly used approaches, the videos enhanced by LiAR do not have the shortcoming of flickering.

Related Work
Actually, conventional image enhancement methods such as histogram-based methods [10][11][12][13][14] can be explored to enhance underexposed images, but in most cases, their efficacy is quite limited. To tackle this problem more effectively, various methods specializing on this task were proposed, which fall roughly into two categories, heuristic ones and data-driven learning-based ones.

Heuristic Methods.
Early attempts [6,7] based on retinex theory remove the illumination and directly extract the reflectance as the enhanced results. Wang et al. [2] proposed a bright-pass filter to decompose an image into reflectance and illumination. Guo et al. [8] estimated the illumination map by imposing a structure prior on it to generate outputs with rich details. However, it neglects the color consistency, resulting in local lightness order error. In [9], Zhang et al. derived an ADMM-based procedure [15] for solving the optimization problem of illumination estimation. Despite its effectiveness in contrast enhancement, it may produce overenhancement artifacts when inputs are properly exposed images because of the fixed transformation rule used to adjust the illumination map. In addition to retinex theory, other commonly used technologies are fusion and S-curve adjustment model. Liu and Zhang [16] proposed a detailpreserving underexposed image enhancement method based on multiexposure fusion mechanism. Fusion mechanism can also be used in video enhancement, such as [17,18]. Yuan and Sun [19] proposed an automatic exposure correction method using S-curve tone mapping. Later, the authors extended their work to correct ill-exposed videos [20]. However, the parameterized S-curve adopted in these methods may compress the midtones, and thus the output images look too flat and unnatural. Zhang et al. [21] designed a CNN (convolutional neural network) [22] to estimate the best-fitting S-curve of the input test image. To avoid loss of details in midtones, they resorted to guided filtering but this might lead to edge distortion in the output.

Data-Driven Methods.
Recent studies on exposure correction are mostly based on machine learning. Dale et al. [23] first established a database comprising 1 million images and executed a visual search in the database. In [24], Bychkovsky et al. made a collection of 5,000 example inputoutput pairs that enables supervised learning. Yan et al. [25] trained deep neural networks to capture sophisticated photographic styles and modeled local adjustments that depend on image semantics. Shen et al. [26] proposed MSRnet based on multiscale retinex theory and trained it on synthesized pairwise images. In [27], Li and Wu proposed a learning-based technique of back-lit image restoration, including segmentation of back-lit and front regions and spatially adaptive tone mapping. Different from above "black-box" models, Hu et al. [28] employed a deep reinforcement learning-based approach to provide users with an understandable solution. Based on retinex theory, Wang et al. [3] trained an illumination mapping estimation network on the new dataset they built, including underexposed images and expert-retouched references. e performance of these learning-based methods highly depends on the training dataset despite the fact that building such a dataset including various types of illumination and contents is a challenging task itself.
In this work, we take the brightness level of the input image into consideration when correcting its illumination map. Such a lightness-aware strategy can avoid overenhancement effectively. Unlike data-driven schemes, we introduce a "zero-shot" scheme to fulfill the task of illumination map estimation so that we can ensure that LiAR will perform consistently well for images spreading over a wide range of exposure levels.

General Pipeline of LiAR.
Our underexposed image restorer LiAR is established based on retinex theory (equation (1)and accordingly, its pipeline comprises two stages, illumination estimation and exposure correction, as illustrated in Figure 1. Given an input image I, we first separate the illumination map S from I (details for illumination estimation are presented in Section 3.2) and then modify S according to its own average brightness. Finally, the restored result I is obtained by applying the corrected illumination S to the scene reflectance as where S and I are the corrected illumination map and the resorted result, respectively. In existing retinex-based methods [8,9], S is usually adjusted using a fixed predefined rule. However, it should be noted that real inputs may have various lightness levels, such as extremely dark ones or normally exposed ones, and they actually require different levels of illumination adjustment. To this end, S has two roles in our approach, an illumination component that needs to be adjusted and also a measure that reflects the brightness of the scene, determining the degree of adjustment. Its latter role accounts for the "lightness awareness" of LiAR. Inspired by gamma transformation in image-tone mapping, our lightness-aware illumination adjustment scheme is designed as where S is the mean brightness of the illumination map S, serving as a measure that reflects the brightness of the scene. Using this transformation, the degree of adjustment can be determined by the illumination brightness. For example, originally darker images with S close to 0 will be greatly enhanced, while well-exposed images with higher S will remain as they are. Several examples are shown in Figure 2 to demonstrate the capability of LiAR. In the first row of Figure 2, I 1 ∼ I 3 are three input images and S 1 ∼ S 3 are their estimated illumination maps. Using LiAR, the corresponding restoration results I 1 ∼ I 3 along with their corrected illumination maps S 1 ∼ S 3 are obtained and shown in the second row of Figure 2. It can be observed that with our lightness-aware strategy, the illumination maps can be adaptively adjusted.
Next, we will discuss how to estimate the illumination map from a given input image.

Illumination Estimation.
Given an image I, its illumination map is expected to be estimated in such a way that the final restored output should have the color consistency with the input and have rich textures. In LiAR, these two goals are achieved by imposing two constraints on illumination map optimization, one for color consistency and one for texture richness.

Color Consistency Loss.
When an image is processed, its intensities of pixels are normalized to [0, 1]. For each color channel, according to equations (2) and (3), the restored intensity at position x can be written as When the restored intensity in one channel I c (x) overflows, which means I c (x) > 1, to ensure that the restored intensities fall in [0, 1], I c (x) will be cut off to I c (x) � 1. In this situation, the color consistency between the input image and the output will be broken since where a∦b means that the vectors a and b are not parallel to each other. To avoid this, S(x) should be In order to consider the color constraint and other constraints together in optimization, equation (6) is expressed as a loss term L c (short for L color ): From the definition of L c in equation (7), it is easy to know that only when S(x) is smaller than max c∈ R,G,B { } I c (x), L c will contribute to the loss. In our implementation, max c∈ R,G,B { } I c (x) is chosen as the initial estimation of the illumination map S 0 (x).

Texture Richness Loss.
In an image, usually the illumination intensity of a surface is relatively flat, and the contrast of the surface should be enhanced to ensure texture richness. If the estimated illumination of a surface spatially fluctuates as texture changes, the calculated reflectance of the Mathematical Problems in Engineering scene will be flatter than the ground truth, resulting in smoothed texture in the output. erefore, in order to ensure that the texture is enhanced, it is necessary to make the illumination intensity as smooth as possible, which can be expressed as a loss term L t (short for L texture ) of illumination estimation: where w(x) is the weight at each pixel and x and y represent the horizontal and vertical directions, respectively. λ t is a predefined parameter. e term λ t (S(x) − S 0 (x)) 2 is used to control the estimation result not to deviate too much from the initial estimation. e remaining key issue is how to design the weight w x (x) and w y (x). Note that a region with small gradients usually corresponds to a flat surface in the scene and needs to be smoothed. Inspired by RTV loss [29], a simplified weight, inversely proportional to the gradient, is designed as where G is a Gaussian filter and I g is the greyscale map of the input image. w y can be computed in a similar way. e weight terms only need to be computed once at the beginning of processing.
Combining the two loss terms L c and L t via a parameter λ, we get the loss function of the illumination estimation: At this point, given an image, its illumination map can be estimated by iteratively minimizing L.

Implementation Details.
LiAR is implemented with PyTorch [30]. Images are converted into tensors for parallel computing. All the operations involved in computing L are differentiable. With respect to the optimization algorithm for updating the illumination map, the SGD (stochastic gradient descent) algorithm [31] is used.
ere are two hyperparameters in LiAR, λ t and λ. In order to keep the values of the three terms of equation (9) in the same order of magnitude, we set λ t � 0.1. λ is designed to control the weight of two losses, especially the color consistency loss L c . e value of L c can reflect the color distortion of the corrected image. us, we need to set the value of λ high enough to make sure that there is no noticeable color distortion. In all experiments, we set λ � 100, which is high enough to make the color consistency term as a strong constraint.

Experiments
We conducted experiments on real-world images to compare the performance of LiAR with the state-of-the-art or representative approaches for underexposed image restoration. Furthermore, the ablation study is performed to evaluate the impact of each component of LiAR. Additionally, we applied LiAR to enhance underexposed videos and then compared its results with other competitors in this field.
All the experiments were carried out on a workstation with a 3.0 GHz Intel Core i7-5960X CPU and an Nvidia GeForce GTX 980Ti GPU.

Evaluation on Underexposed Images
4.1.1. Datasets. Since our goal is to evaluate the capability of restoration on different exposure levels, the dataset is desired to contain images with various exposure levels. To this end, the experiments were performed on 1,500 real-world images taken from IE ps D [32], which was established for studying the problem of exposure level assessment. We partitioned these images into three groups, 500 images for each group, according to their exposure settings. ree groups are "well exposed" (Group A), "slightly underexposed" (Group B), and "severely underexposed" (Group C).

Objective Evaluation.
e performance of underexposed image restoration methods was evaluated with two objective metrics, CDIQA (contrast-distorted image quality assessment) [33] and LOE (lightness order error) [2].
CDIQA is a no-reference quality assessment of contrastdistorted images, which can be considered as a metric for richness of image details. A higher CDIQA value roughly corresponds to higher contrast. LOE is a measure to objectively assess the naturalness preservation between the input and enhanced output. Ideally, if the enhancement approach does not violate the relative lightness order of pixel values in the input image, the associated LOE measure would be zero. us, a lower LOE value roughly corresponds to less artifacts caused by restoration. e results over 1,500 test images are reported in Table 1. It can be seen that for every case, LiAR can obtain a high CDIQA value and a low LOE value, demonstrating its superiority in restoring the input image's details while keeping its naturalness. It also corroborates that LiAR has a strong generalization capability and can be employed to cope with images spreading over a wide range of exposure levels. By contrast, the performance and robustness of the competitors are apparently inferior to LiAR. For example, though Exposure [28] and DeepUPE [3] perform quite well when dealing with well-exposed images (Group A), their performance deteriorates significantly on obviously underexposed ones (Groups B and C). As for LIME [8] and ExCNet [21], for all cases, they can achieve high CDIQA values, indicating that their outputs are of high contrast. However, their LOE values are also quite large, implying that they suffer from the problem of overenhancement. Figure 3 compares the restoration results of the competing methods on a severely underexposed input. It can be seen that facing such an extremely dark image, the results of the learning-based methods Exposure [28] and DeepUPE [3] look quite dim and the details are invisible. Figure 4 shows the restoration results on a slightly ill-exposed image. For this case, S-curve-based approaches [19,21] tend to get flat results, meaning that midtone textures are significantly compressed. In both Figures 3 and 4, the results of LIME [8] obviously suffer from the unwanted artifacts. By contrast, the outputs of LiAR are natural and of high contrast. ese observations are consistent with the quantitative evaluations reported in Table 1.

Results on Different Exposure Levels.
In order to demonstrate the generalization capability of LiAR and the drawback of learning-based approaches, we conducted experiments on images with different exposure levels. We compare LiAR with a state-of-the-art learning-based method DeepUPE [3]. As shown in Figure 5, DeepUPE [3] fails to enhance severely underexposed images. e underlying reason is that its training dataset does not cover the extremely underexposed cases like Figures 5(b) and 5(c), which shows that learning-based approaches rely heavily on the training data, implying that their performance may deteriorate noticeably once the conditions they were trained on are not satisfied anymore. By contrast, our proposed approach LiAR, as an image-specific method, performs consistently well for images spreading over a wide range of exposure levels.
Mathematical Problems in Engineering 4.1.6. Ablation Study. We performed an ablation study to analyze the importance of each component of LiAR, and the results are summarized in Table 2.
e first three settings in the table correspond to removing two loss terms and the lightness-aware design on the basis of LiAR. It can be seen that removing L c can achieve high contrast while causing serious artifacts. On the contrary, removing L t and only using L c lead to low contrast while keeping the lightness order consistency. erefore, it can be confirmed that combining L c and L t can help to balance contrast and fidelity of the restored results. If the lightness-aware illumination correction strategy is replaced   [6]. (c) Yuan and Sun [19]. (d) LIME [8]. (e) Exposure [28].
with a fixed gamma transformation like [8,9] where c � 0.4, the performance on Group A is satisfied while the performance on Groups B and C is much inferior to LiAR. e underlying reason is that the fixed rule cannot adaptively adjust the illumination maps.

Evaluation on Underexposed Videos.
ough LiAR is initially designed for coping with a single image, it can be easily adapted to the video case. In this experiment, its performance for underexposed video restoration was evaluated.

Dataset and Compared Methods.
Since there is no publicly available dataset for the study of underexposed video restoration, we collected such a dataset by ourselves which includes 112 video clips with back-lighting or lowlight illumination conditions. ey were also classified into three groups, "well exposed" (Group A, 32 clips), "slightly underexposed" (Group B, 42 clips), and "severely underexposed" (Group C, 38 clips). We compared LiAR with four representative approaches in this field, including (1) virtual exposure [34], (2) Dong et al.'s method [35], (3) the traditional image enhancement method HE [12], and (4) ExCNet [21].  [3], which is a learning-based approach. (g-i) Results of LiAR, the approach proposed in this paper. It is not difficult to find that DeepUPE [3] can restore slightly underexposed images very well while performing quite poor for the severely underexposed ones. By contrast, LiAR, as an image-specific approach, can deal with images with a wide range of exposure levels quite well.

Pairwise Comparison User Study.
We conducted a user study with ten volunteers (5 males and 5 females) to make pairwise comparison between the corrected results of our method and those of the compared methods. is comparison was made from three aspects, including "details visibility," "visual naturalness," and "overall preference." For each pairwise comparison, the group of videos and the order of method pairs were randomized to avoid subjective bias.
ere were three options for the user to choose: "left is better," "right is better," or "no preference." e results of the user study are summarized in Figure 6.
ere are three bars in each pairwise comparison corresponding to the subject's preference, which are the number of the votes for "our method," "competitor," and "no preference" from left to right. e number of videos from different groups is represented with different colors. e results in Figure 6 clearly demonstrate that no matter which criterion is used, the participants showed a strong bias in preference towards the correction results of LiAR. Figure 7(a) is the input frame while Figures 7(b) ∼ 7(f ) are the restoration results of the competing methods. It can be observed that the result of LiAR has better color consistency, finer details, and less overenhancement artifacts.

Visual Quality.
Unlike the case of processing a single image, when restoring an underexposed video, in addition to ensuring the restoration quality of each frame, we must ensure the smoothness of the video content, that is, we cannot introduce flickering artifacts during the restoration process. erefore, the restoration algorithm is expected to have the ability to  [35], (d) HE [12], (e) ExCNet [21], and (f ) LiAR, respectively. Methods ave_SRCC Virtual exposure [34] 0.7157 Dong et al. [35] 0.7308 HE [12] 0.0757 ExCNet [21] 0.3699 LiAR 0.7198 8 Mathematical Problems in Engineering maintain the brightness order of video frames. In this paper, to quantify the algorithm's ability to keep the brightness order of video frames, a metric "ave_SRCC" is designed as follows. Suppose that v i is a video clip having n frames and the set of its average frame brightness is denoted by b where m is the number of video clips and SRCC computes the Spearman rank-order correlation coefficient of two vectors [36]. ave_SRCC values of the competing methods are listed in Table 3. It can be seen that ave_SRCC values of virtual exposure [34], Dong et al.'s method [35], and LiAR are much higher than those of HE and ExCNet [21]. It indicates that the restored videos of the former three approaches have much less flickering artifacts than those of the latter two approaches. is conclusion is consistent with the intuitive observation when comparing results subjectively.

Time Cost.
In this experiment, the running speeds of evaluated approaches are analyzed. In Table 4, the time cost of each competing method for processing one frame is presented for reference. We tested three commonly encountered video resolutions, 1080P, 720P, and 480P. Whether the implementation was based on CPU or GPU is also reported in Table 4. It needs to be noted that for competing methods by other authors, we used their own or official implementations, and thus for virtual exposure [34] and Dong et al.'s method [35], we did not have their GPUbased implementations. LiAR's implementation is based on GPU and it consumes about 30 ms to process one 640 × 480 video frame. Figure 8 presents three examples where LiAR fails to produce visually compelling results. For the extremely dark input images with noise, we amplified the noise in the dark regions when we greatly brightened images. is is because the images collected in a dim environment usually contain noise in dark regions, and the noise will be regarded as texture information and then amplified when illumination is brightened.

Conclusions
is paper proposes LiAR, a two-phase approach for underexposed image restoration. Given an input image, LiAR first estimates its illumination map by minimizing a loss which comprises two terms used to ensure color consistency and texture richness of the output, respectively. en, it adjusts the illumination map in a lightness-aware way. Experimental results demonstrate that images enhanced by LiAR own high contrast while keeping naturalness. In addition, LiAR can be easily extended to the video  case. Compared with other competitors for underexposed video restoration, LiAR can output frames with pleasing quality. More importantly, it can keep the brightness order among video frames quite well, which makes it avoid the flickering artifacts usually existing in outputs of other evaluated approaches. Our future work is to design a denoising module to suppress noise in extremely dark regions. A direction is to perform denoising on the reflectance component obtained by retinex decomposition.
Data Availability e source code and video dataset have been made publicly available at https://cslinzhang.github.io/LiAR-Homepage/.

Conflicts of Interest
e authors declare that they have no conflicts of interest.