An efﬁcient framework for deep learning-based light-defect image enhancement

The enhancement of light-defect images such as extremely low-light, low-light and dim-light has always been a research hotspot. Most of the existing methods are excellent in speciﬁc illuminations, and there is much room for improvement in processing light-defect images with different illuminations. Therefore, this study proposes an efﬁcient framework based on deep learning to enhance various light-defect images. The proposed framework estimates the reﬂectance component and illumination component. Next, we propose a generator guided by an attention mechanism in the reﬂectance part to repair the light-defect in the dark. In addition, we design a colour loss function for the problem of colour distortion in the enhanced images. Finally, the illumination map of the light-defect images is adjusted adaptively. Extensive experiments are conducted to demonstrate that our method can not only deal with the images with different illuminations but also enhance the images with clearer details and richer colours. At the same time, we prove its superiority by compar-ing it with state-of-the-art methods under both visual quality comparison and quantitative comparison of various datasets and real-world images.


INTRODUCTION
High-visibility images have vivid details and rich colours, which not only meets human sensory needs but also plays a vital role in computer vision, such as image enhancement, feature matching [1], object detection [2], object recognition [3], object tracking [4], and so forth. However, due to the poor shooting environment, the limitation of the performance of the shooting equipment and the differences in shooting technology, the visibility of the images will be reduced. For this kind of low-light images, deep learning-based image enhancement has shown significantly good performance in improving image quality [5]. The Retinex theory [6] assumes that the image is divided into illumination component and reflectance component. It is a model of human visual brightness and colour perception. It concentrates on local enhancement, which solves the problems of insufficient brightness enhancement of local areas of the images by the global enhancement algorithm to a certain degree; and it can adaptively enhance different types of images. Early attempts based on Retinex, such as single-scale Retinex [7] and multi-scale Retinex [8], treat the reflectance as the final enhanced result, which often looks unnatural and appears to be over-enhanced frequently [9]. Guo et al. [9] estimated the illumination of each pixel individually and refined the initial illumination map by imposing a structure prior to it as the final illumination map. However, due to the blindness of light structure, they might lose the realism of regions with rich textures. Wang et al. [10] proposed a global illumination perception and detail preserving network (GLADNet). GLADNet calculated global illumination estimation for the low-light input, then adjusted the illumination under the guidance of the estimation and supplemented the details using a concatenation with the original input. This method was prone to distortions such as colour and background in low-light image enhancement. Chen et al. [11] learnt to decompose the observation image into reflectance image and illumination image in a data-driven way without considering the ground truth of the reflectance and illumination decomposition and then enhanced the obtained illumination image. This method can well maintain the colour saturation of the images during the enhancement process, but it needs to be improved in the processing of noise and details. On the basis of the classic Retinex model, Li et al. [12] proposed a robust Retinex model and tried to predict image noise for the first time. At the same time, they estimated structure-revealed reflectance map and piecewise smooth illumination map. Even though this method has better performance for denoising, the processing of details needs to be strengthened. Zhang et al. [13] proposed a method of lighting up the dark (KinD) to decompose the low-light image and its corresponding high-light image into the reflectance and the illumination parts and used the reflectance component of the high-light image as a denoising reference to repair the dark light defect. This method is very effective for extreme low-light images, but the disadvantage is that the colour of the enhancement results is not bright enough. There is still a lot of room for improvement in detail restoration. Guo et al. [14] presented a novel method zero-reference deep curve estimation (Zero-DCE) that formulated light enhancement as a task of image-specific curve estimation with a deep network. It is achieved through a set of carefully formulated non-reference loss functions, which implicitly measures the enhancement quality and drive the learning of the network. The attraction of Zero-DCE is that it does not need any paired or unpaired data during training. However, it is deficient in dealing with noise. And sometimes there is serious distortion in colour.
From the application point of view, image enhancement not only needs to recover the light-defect images from the dark but also ensures that the original colour, texture and other details of the images in the processing of image enhancement are accurate. In this study, a new image enhancement framework is designed to deal with this kind of light-defect images.
We propose a new light-defect image enhancement framework based on Retinex in this study. The framework is divided into three parts: Image decomposition, reflectance restoration and illumination adjustment. Major contributions of the proposed method are twofold: (1) This study proposes a UNet++ [15] generator guided by attention mechanism [16] in the reflectance restoration part. The attention-guided UNet++ generator can allocate more attention to the places that need attention in the reflectance map. By using UNet++ to extract the features of light-defect images, we can extract the simple features of shallow structures such as image boundaries and colours; while for deep structures, UNet++ can be better extract the abstract features of the reflectance images due to the increase of receptive field and convolution operations. In addition, a colour loss function is designed in the reflectance restoration part to ensure that colours of the enhanced images are as close as possible to the normal images while the features are captured in-depth, and (2) in the illumination adjustment part, the parameters can be reduced and the receptive field can be enlarged by superimposing small filters to deepen the network. At the same time, we can obtain the effect of high-efficiency light regulation by deepening layers to transfer information hierarchically.

FRAMEWORK
Taking into account the blindness of illumination, it is urgent to obtain illumination information when enhancing the light-defect images. If having the illumination well-extracted from the input, the rest hosts the details and possible degradations, where the restoration (or degradation removal) can be executed on [13]. Since the noise components are amplified during the contrast enhancement process, the noise removal of the reflectance component is needed to provide a high-quality image [17]. The basic assumption of Retinex theory is that the image S observed by the human eyes is determined by the incident light and reflectivity R, which can be modelled as where S and R are the light-defect image and the desired reflectance map, respectively. Furthermore, I represents the illumination map and the operator o means element-wise multiplication. Inspired by Retinex theory, we build a deep network, denoted as RLDNet, to solve the problems of light-defect image enhancement. As schematically illustrated in Figure 1, the RLD-Net is composed of three modules, including image decomposition, reflectance restoration and illumination adjustment. In this study, the light-defect images are decomposed into illumination and reflectance images. On the reflectance images, the UNet++ generator guided by attention is used to restore the reflectance images and remove the noise to repair the dark defects. Furthermore, we add a colour loss function to ensure that the original colours of the images are not lost. The illumination part adopts adaptive adjustment, which is the core of the flexible adjustment of image brightness. Next, this study will introduce the three modules of RLDNet in detail as shown in Figure 1.

Image decomposition
The image decomposition module contains two branches corresponding to the illumination and the reflectance, respectively. We adopt UNet++, followed by convolutional and sigmoid layers to get the reflectance branch; while the illumination branch is composed of two convolutional layers following rectified linear unit (ReLU) layers and a convolutional layer on concatenated feature maps from the reflectance branch, followed by a sigmoid layer finally. We use UNet++ instead of U-Net [18] used in KinD. UNet++ is equivalent to filling the original hollow U-Net, which can capture different levels of features and integrate them through feature superposition. Different sizes of receptive fields have different sensitivities to target objects of different sizes. It is easy to recognise large objects because of the large receptive field. In the actual decomposition process, the edge information of large objects and the small objects themselves are easy to be lost due to the down-sampling and up-sampling operations of the deep network, so we need the features of the small receptive field to help us. Using UNet++ can decompose more subtle features, which is helpful for subsequent operations such as restoration and denoising of reflectance images. The specific details of image decomposition are shown in Figure 2.
In the image decomposition process, we obtain a pair of lightdefect/normal image each time and learn the decomposition of The framework is composed of three modules, including image decomposition, reflectance restoration and illumination adjustment. Input_low refers to the input light-defect image. Input_norm refers to the normal image corresponding to the light-defect image. Ratio represents the enhancement ratio between I norm and I low , and extends to the feature map. R_low is a reflectance of the low-light image obtained by decomposition network. R_norm is a reflectance of the normal image. I_low represents the illumination map of the low-light image obtained by decomposition network. I_norm represents the illumination map of the normal image. R_output represents the enhanced reflectance map, and I_output represents the enhanced illumination map. Enhanced image is obtained by multiplying the enhanced illumination map and the reflectance map The specific details of image decomposition. Input_low refers to the input light-defect image. Input_norm refers to the normal image corresponding to the light-defect image. R_low is a reflectance of the low-light image obtained by decomposition network. R_norm is a reflectance of the normal image. I_low represents the illumination map of the low-light image obtained by decomposition network. I_norm represents the illumination map of the normal image the light-defect images and their corresponding normal images under the guidance of the light-defect and normal images. The input of light-defect image is denoted as S low , and the normal image is denoted as S norm . In this study, the reflectance components of the light-defect/normal images obtained by decomposition network are denoted as R low and R norm , respectively, and the illumination components are I low and I norm . Generally speaking, reflectivity is an inherent property of the object and is not affected by illumination. In the condition of no image degradation, the reflectivity should be the same. Furthermore, the illumination maps should be piecewise smooth and mutually consistent. Therefore, in the image decomposition part, we set two constraints: (1) Light-defect and normal images share the reflectivity. (2) The illumination mapping is smooth and consistent with each other.
The decomposed two layers should reproduce the input, which is constrained by the reconstruction error. The reconstruction loss function is given by the following formula where ||.|| 1 means L 1 norm: In order to verify the correctness of the image decomposition, we usually need to ensure that the decomposed reflectance and the illumination maps are multiplied and the resulting image is consistent with the light-defect image.
Based on the principle of reflectance component consistency, we use the reflectance invariant loss function to regularise the reflectance similarity as shown in formula (3): As for the mutual consistency, we adopt consistency loss function L mc [13] where c is the parameter controlling the shape of the function.
In this study, c is set to 10. In formula (4), m represents the sum of the gradient of the illumination maps about low-light and the normal images. The specific expression of m can refer to the following formula (5) in which ∇ is a first-order derivative operator with horizontal and vertical directions.
In general, the illumination changes greatly in the strong edge areas of the input images, while in the weak edge areas, the illumination distribution should be smooth. The illumination smoothness loss function is shown in formula (6) where t is a small positive constant. This study sets t to 0.01 in order to avoid the denominator being zero.
Using the smoothness loss function can reduce over-fitting, improve the generalisation ability of the network, and enhance image contrast. This smoothness term measures the relative structure of the illumination with respect to the input. The penalty on illumination is small for a location on an edge in lightdefect images, while for a location in a flat region in light-defect images, the penalty turns to be large.
To sum up, the expression of the loss function L de of the image decomposition part is shown in formula (7): In this study, is set to 1, is set to 0.01, is set to 0.15, and is set to 0.1.

Reflectance restoration
The degradation of the light-defect image is more serious than that of the normal image, and the distribution on the reflectance is more complicated. When the estimated illumination map is smooth, all details are preserved in reflectance images, including the enhanced noise. Therefore, we can denoise on the reflectance images. For light-defect images, the distribution of noise is often uneven, and most of the noise is concentrated in the low-light area. Attention mechanism can make the network more focused on the part that needs attention, that is, we add attention to solve the problem of uneven distribution of noise. At the same time, the texture, colour and other details of the reflectance part will be reduced or even lost due to various reasons. The role of the attention mechanism is crucial. U-Net has achieved huge success on semantic segmentation, image restoration and enhancement [19,20]. U-Net adopts superposition operation, and the most classic idea is encoder-decoder. U-Net retains rich information in texture and synthesises highquality images using multi-scale context information. UNet++ is based on U-Net, which integrates long and short connections. UNet++ can capture features of different levels, integrate them by feature superposition, and add a shallower U-Net structure to make scale differences of the feature maps smaller during fusion, which is conducive to the extraction of more details. Therefore, this study proposes an attention-guided UNet++ generator in the reflectance part. Reflectance restoration cannot be processed uniformly on the whole image, so illumination map can be used as a guide. The original input is considered to contain more details, so we consider concatenating the original image with the illumination map. Then, we use five convolutional layers at 3 × 3 with ReLU as the activation function for feature extraction and use one convolutional layer at 1 × 1 to facilitate the information fusion between the feature maps. Finally, we get the attention feature map called . Among them, the number of channels is 32. The value is given by the following formula where denotes the sigmoid function and f 3x3 represents convolution operation with the filter size 3 × 3: Finally, the result of combining the attention and reflectance maps is used as the input of attention-guided Unet++ generator. Among them, the input of attention-guided UNet++ generator can be expressed as formula (9): The specific details of reflectance restoration are shown in Figure 3.
Our principle is to use a clear reflectance as a reference for a clutter reflectance. The reference used for denoising in this The specific details of reflectance restoration Input_low refers to the input light-defect image. R_low is a reflectance of the low-light image obtained by decomposition network. R_norm is a reflectance of the normal image. I_low represents the illumination map of the low-light image obtained by decomposition network. R_output represents the enhanced reflectance map study is a reflectance image decomposed from a normal image. L ssim is a structural similarity loss function, which takes into account the brightness, contrast and structural indicators. It is more conducive to and close to human visual perception. Generally, the results produced by L ssim are more detailed than those produced by L 1 and L 2 loss function and do not make the images too smooth. It can be expressed as formula (10) where • R corresponds to the restored reflectance: L 2 norm can better remove noise and ring artefacts in the enhanced results [21]. L 2 and the gradient loss function L grad are shown in formulas (11) and (12): In order to avoid insufficient colour saturation and distortion during the restoration process, we design a colour loss function L rgb to ensure that the colour is as close as possible to the colour of the normal images. Its expression is shown as the following formula where c is a small positive constant: This study sets c to 0.1 in order to avoid the denominator being zero.
To sum up, the expression of the loss function L re of the reflectance restoration network is shown in formula (14): In this study, , , are set to 1, and is set to 4.2.

Illumination adjustment
In practical applications, we will encounter light-defect images with different illuminations. In order to achieve various light-defect image enhancement tasks with high efficiency, we need a mechanism to flexibly convert one light condition to another. Although we do not know the exact relationship between the target illumination source I norm and the incident illumination source I low , we can know its enhancement ratio, which is expressed as formula (15): The target illumination sources in this study are higher than the incident illumination sources of the light-defect images. So the ratio should be greater than 1. can be specified by users in the testing phase. is extended to a characteristic graph with I low as the input of the illumination adjustment network. The illumination adjustment module contains 11 convolutional layers (10 convolutional layers at 3 × 3 + ReLU and one convolutional layer at 1 × 1) and one sigmoid layer. The channels are 64. In addition, the convolutional layers 2, 3, 5, 6, 8, and 9 are all added with the features obtained by the previous layer of convolution, which can make it possible to repair the dark light defect while obtaining clearer details and richer colours. We call the 11 convolution layers the merge module in this study. The advantages of this module, compared to KinD, can be verified in the ablation experiment in the next section. The details of the illumination adjustment network are shown in Figure 4.
The loss for illumination adjustment network is given by the following formula where • I corresponds to the adjusted illumination:

FIGURE 4
The details of the illumination adjustment network I_low represents the illumination map of the low-light image obtained by decomposition network. Ratio represents the enhancement ratio between I norm and I low , and extends to the feature map. I_output represents the enhanced illumination map

Datasets and experimental details
In this study, we train on the public paired dataset LOL dataset [11]. The LOL dataset is a dataset of low/normal-light image pairs taken from real scenes by changing exposure time and ISO. LOL dataset contains 500 light-defect images and 500 normal images corresponding to the light-defect images. The scenes in this dataset are diverse, most of which are extreme low-light images, and the image resolution is 600 × 400. This study randomly selects 500 pairs of low-illumination images and image pairs corresponding to normal images in another public MIT-Adobe Fivek [22] dataset, which contains 5000 original images. Most of the images in this dataset are low illumination. And the image resolution is 512 × 512. In practical applications, there are no corresponding normal images as references for light-defect images, and in order to test that our method is not only applicable to extreme low-light, low-light, but also dim-light images. We also choose public datasets DICM [23] and MEF [24] without image pairs to test. An example of four datasets is shown in Figure 5. For the image decomposition network and illumination adjustment network, the batch size is set to 10 and patch size to be 48 × 48. While for the reflectance restoration network, the batch size is set to be four and patch size to be 384 × 384. The entire network is trained on an NVIDIA Geforce GTX1070 GPU, RAM of 64GB, AMD Ryzen R9 3900 × 12-Core Processor 3.80 GHz CPU using the TensorFlow framework.
It is worth noting that KinD [13] has published a better pretrained model on its GitHub. One of the methods compared in this study, KinD, can get better test results both visually Visual comparison from the ablation study of RLDNet. Rows 1-6 display the input images, KinD [13], (a) results from RLDNet with attention, (b) results from RLDNet with attention+UNet++, (c)results from RLDNet with attention+UNet+++L rgb , and results from the final version of RLDNet, respectively. The colour is not bright enough or the details are not prominent enough in KinD, (a), (b), (c). The final version of RLDNet is able to mitigate the above issues and gains the most visually pleasing results and quantitatively than the results published in the study by KinD authors. In other words, our method is compared with the improved version of KinD.

Ablation study
To demonstrate the effectiveness of each component proposed in our framework, we conduct several ablation experiments. Specifically, we design three experiments:(a) Removing UNet++ and L rgb and the merge module on RLDNet.
(b) Removing L rgb and the merge module on RLDNet. (c) Removing the merge module on RLDNet; and RLDNet: Using a UNet++ generator guided by attention in the reflectance images, and a loss function is used to adjust the colour distribution, and a merge module is designed in the illumination part. As shown in Figure 6, the first row shows the input images. The second row shows the images produced by KinD. The third, fourth and fifth rows represent three ablation experiments. The last row is produced by our proposed version of RLDNet.
Method (a) simply adds attention, it can be seen from the third column that the details such as the seat become obvious, but in the second column of the frame display, the detailed texture of the wool is very poor. In comparison, due to the role of UNet++, the details are relatively better in (b) and (c). Compared with KinD, (a), (b), a colour loss function is added in method (c). The purple blanket in the first column in Figure 6

FIGURE 8
The visual quality comparison of proposed and current state-of-the-art methods on the LOL dataset (a) input image, (b) LIME [9], (c) GLAD [10], (d) Retinex-Net [11], (e)Robust Retinex [12], (f) KinD [13], (g) Zero-DCE [14], (h) RLDNet can well prove the indispensability of colour loss. In contrast, our method RLDNet is accompanied by colour loss function under the action of the UNet++ generator guided by attention. At the same time, the illumination adjustment part uses more convolutional layers for feature extraction. It is excellent in terms of detail clarity, contrast or colour distribution. In summary, the effectiveness of the component designed in the framework can be verified.

Comparison with state-of-the-art methods on the datasets
In this section, we compare the performance of RLDNet with current state-of-the-art methods on the public datasets as mentioned in Section 3.1. We conduct a list of experiments including visual quality comparison, image quality assessments, which are elaborated in the next section.

Visual quality comparison
In order to evaluate the performance of our method, we compare our method with low-light image enhancement via illumination map estimation (LIME) [9], a global illumination-aware and detail-preserving network (GLAD) [10], Retinex-Net [11], Robust Retinex [12], kindling the darkness (KinD) [13] and Zero-DCE [14] for visual quality comparison. The experimental comparison images are shown in Figures 7-11. To enhance the light-defect images, we must ensure that during the enhancement process, the main bodies of the images are enhanced while the backgrounds are not distorted. Combining colour, detail definition, denoising performance, and so forth, we analyse the enhanced images.
In Figure 7, the colour distortion of GLAD (c), KinD (f) can be seen clearly from the red frame. The colour of Retinex-Net (d) is very bright, but there is too much noise. The enhanced brightness of LIME (b), Robust Retinex (e) and Zero-DCE (g)

FIGURE 10
The visual quality comparison of proposed and current state-of-the-art methods on the MEF dataset (a) input image, (b) LIME [9], (c) GLAD [10], (d) Retinex-Net [11], (e) Robust Retinex [12], (f) KinD [13], (g) Zero-DCE [14], (h) RLDNet are insufficient. We can see that our method outperforms other methods in terms of noise and colour, which can be clearly seen from the place marked by the red frame.
In Figure 8, the colour distortion of GLAD (c) can be seen clearly. The colours of Retinex-Net (d) on stands, floors and other objects are very bright, but there is too much noise. The details of LIME (b), Robust Retinex (e), KinD (f) and Zero-DCE (g) are not as clear as our methods. Finally, we can see that our method outperforms other methods in terms of noise, colour and detailed clarity, which can be clearly seen from the place marked by the red frame.
In Figure 9, from the sky point of view, the brightness of the sky in method Robust Retinex (e) is too bright, which leads to the distortion of clouds compared with the input image (a), and the colour of the sky in method Zero-DCE (g) is too different from the input image (a). From the perspective of building colour and details, the colour of Retinex-Net(d) is so bright that it is different from the input image (a). The overall brightness of enhancement results of LIME (b) and GLAD (c) is not enough, which leads to the poor visual effect. KinD (f) is inferior to our method in terms of building colour and details of the window of arched doors. On the whole, our method RLDNet is excellent in detail, colour and overall effect.
In Figure 10, the sky brightness of GLAD (c) is overenhanced, resulting in the loss of details such as white clouds. Zero-DCE (g) has distortion in sky colour. On the premise that the enhancement results are not distorted, the overall brightness of LIME (b) is not as good as other methods. Robust Retinex (e) does not perform as well as KinD (f) and RLDNet in mountain details. However, the colour of KinD on the green field is not as good as RLDNet. As a result, RLDNet performs best, no matter from which perspective.
In Figure 11, the colour distortion of GLAD (c), Retinex-Net (d), KinD (f) and Zero-DCE (g) can be seen clearly. Owing to

FIGURE 11
The visual quality comparison of proposed and current state-of-the-art methods on the Fivek dataset (a) input image, (b) LIME [9], (c) GLAD [10], (d) Retinex-Net [11], (e) Robust Retinex [12], (f) KinD [13], (g) Zero-DCE [14], (h) RLDNet the lack of constraint on the reflectance, LIME (b) can easily cause a lack of naturalness in the enhanced images [25]. Because Figure 11 comes from a dataset with image pairs, our method is closer to the normal image in the overall colour distribution of the image. The overall brightness of Li et al. (e) is too high, causing the colour of the leaves to appear distorted.
Combined with a large number of experimental results, we can see that the proposed method is better than other methods in colour, detailed clarity and denoising performance. Furthermore, RLDNet can obtain good enhancement effects on extreme low-light, low-light, and dim-light images. It can be seen that the framework in this study has a wide range of application and strong practicability.

Quantitative comparison
The differences in user preferences will make the visual quality comparison not comprehensive enough. In order to further verify the advantages of the framework, this study uses structural similarity (SSIM) [26] and peak signal-to-noise ratio (PSNR) to evaluate the image quality of reference images. In this study, 15 images in LOL dataset are selected for testing. In order to ensure the fairness of the comparison, the test images are excluded where the SSIM and PSNR are extremely high or low. We use the average of 15 test images as the test value of SSIM and PSNR. Quantitative comparisons on LOL dataset in terms of SSIM and PSNR are summarised in Table 1.
Considering that the datasets such as DICM [23] and MEF [24] without the reference and enhanced images obtained in real-world have no corresponding reference images, we adopt a well-known non-reference image quality evaluation method, natural image quality evaluator (NIQE) [27], to comprehen- sively evaluate various images. We select some images from four datasets separately to calculate the average value of their NIQE. Quantitative comparison on low-light dataset (LOL) [11], Fivek [22], DICM [23] and multi-exposure image fusion (MEF) [24] datasets in terms of NIQE are summarised in Table 2.

FIGURE 13
The visual quality comparison of proposed and current state-of-the-art methods on the real-world images (a) input image, (b) LIME [9], (c) GLAD [10], (d) Retinex-Net [11], (e) Robust Retinex [12], (f) KinD [13], (g) Zero-DCE [14] (h) RLDNet The higher the value of SSIM and PSNR, the closer the enhanced images are to the normal images. On the contrary, a lower NIQE value indicates better visual quality. As can be seen from Tables 1 and 2, although Zero-DCE [14] is currently one of the latest methods for low-light image enhancement, it is mainly for non-reference images. For reference datasets LOL and Fivek, it is at a disadvantage from an objective perspective. In addition, due to excessive noise, colour distortion, different enhancement effects of Zero-DCE [14] appear on different non-reference datasets. Even though the method in this study uses a reference dataset for training in the training phase, the processing of noise, detailed texture and colour make RLD-Net still perform well on the non-reference datasets DICM and MEF.

Comparison with state-of-the-art methods on the real-world images
From the perspective of the application, we designed an effective framework for light-defect images, which not only has an excellent performance in various datasets but also can adapt to real-life scenes. We use the camera Nikon D5200 and Huawei mate9 as the shooting equipment and select the surrounding areas, urban areas and schools as our shooting sites. We select three representative images as the next comparison object.

3.4.1
Visual quality comparison  show several comparisons of our method RLD-Net with LIME [9], GLAD [10], Retinex-Net [11], Robust Retinex [12], KinD [13] and Zero-DCE [14] on real-world images. LIME [9] adds a structure-aware prior to the illumination map to adjust the illumination of light-defect images. It leads to insufficient enhancement of LIME for low-light areas of lightdefect images, such as curtains and pink plush phone case in Figure 12(b), the vase in Figure 13(b) and the texture of the tree in Figure 14(b).

FIGURE 14
The visual quality comparison of proposed and current state-of-the-art methods on the real-world images (a) input image, (b) LIME [9], (c) GLAD [10], (d) Retinex-Net [11], (e) Robust Retinex [12], (f) KinD [13], (g) Zero-DCE [14] (h) RLDNet GLAD [10] has shortcomings in the enhancement of brightness in extreme low-light areas, and many textures cannot be reflected, such as curtains and pink plush phone case in Figure 12(c), the vase in Figure 13(c). In addition, distortion is prone to appear in places with high brightness. For example, in the ground area marked by the red frame in Figure 14(c), its brightness is so bright that its colour and the texture of the ground are distorted.
Retinex-Net [11] performs very well for colour during the enhancement process, but its noise is too large, which can be seen in Figures 12-14(d).
Compared with Retinex-net, the method of Robust Retinex [12] has a great improvement in image noise processing, but there is still a lot of room for improvement in detail processing. For example, the texture of the vase of Figure 13(e), and the texture of the tree in Figure 14(e) are severely lost. In addition, the text part in the middle of the pink plush phone case in Figure 12(e) and the surrounding border are blurred.
KinD [13] is effective in noise processing, but the colour of the enhanced object will show a certain degree of distortion, such as the colour of the curtains and pink plush phone case in Figure 12(f). It can be seen from Figure 13(f) that the blue of the vase itself and the details of the veins in the middle of the vase are slightly worse than RLDNet. Due to the overall brightness, the details of the enhanced objects are lost. For example, the ground in Figure 14(f) looks smooth and has no texture, and the effect of leaves is not as good as our method.
Zero-DCE [14] is trained end-to-end with zero reference images by devising a set of differentiable non-reference losses. However, this method is prone to noise, such as the curtains and pink plush phone case in Figure 12(g). Owing to the lack of constraint on the reflectance map, the brightness of the enhancement result is prone to be too bright, resulting in the lack of texture and colour distortion of the object itself, such as the colour of the vase in Figure 13(g), the texture of the ground and its colour in Figure 14(g).
As can be seen from the above, our method RLDNet can enhance the brightness of the low-light area as much as possible during the enhancement process while preserving the background area without distortion. At the same time, our method is more excellent in denoising, colour and texture preservation.

Quantitative comparison
The real-world images we select represent different scenes and brightness. We select the indoor extreme low-light image (Figure 12), indoor dim-light image (Figure 13), and the image of the street at night with extreme low light ( Figure 14). We use NIQE [27] to evaluate the quality of Figures 12-14. The quantitative measurement results are shown in Table 3. It can be seen from Table 3 that the enhancement effect of our method RLDNet in this study is the same as the visual comparison result, which is the best. The order of the methods in   Although zero-DCE [14] is one of the most advanced methods in the field of low-light image enhancement, it will also result in low-quality enhanced images due to excessive noise and texture loss caused by excessive brightness. Although our method is trained on a reference dataset, it benefits from the generator guided by the attention mechanism of the reflectance map and the design of the colour saturation loss function. For reference and non-reference images, it can get a better enhancement effect and can deal with the problems of light-defect images with different degrees of illumination.
Combining the visual quality comparison and quantitative comparison results on various datasets and real-world images, the framework in this study is feasible and universal.

CONCLUSION
In this study, we design a framework RLDNet based on deep learning for light-defect image enhancement. The RLDNet is composed of three modules, including image decomposition, reflectance restoration and illumination adjustment. Inspired by the Retinex theory, we decompose the light-defect image into the reflectance and illumination components. To remove the degradations previously hidden in the darkness, we choose to perform denoising and other operations on the amount of reflectance. We design a UNet++ generator guided by attention in the reflectance restoration part. In addition, a colour loss function is proposed in the reflectance restoration part to ensure that the colours of the images are as close as possible to the normal images. In the illumination part, we design an adaptive adjustment network to convert one illumination condition into another. Combining the ablation experiments, visual and quantitative comparisons in this study can all prove the effectiveness of the framework. Compared with the state-of-the-art methods, our method can process the low-light images under various illumination conditions well. RLDNet enables the light-defect images to retain their original features in the background areas during the enhancement process, and the results of the enhancement of the low-light areas have clearer details and richer colours. The RLD-Net in this study can be used in visual appreciation and also can be used in computer vision technology, which has high applica-tion value. Our future work is to make a dataset that combines low-light, normal, and exposed images to expand the application range of light-defect image enhancement network. In addition, we intend to lightweight the method or add timing processing so that our method can be better used in video postprocessing.