MSIGEN: Multi-Scale Illumination-Guided Low-Light Image Enhancement Network

Images lack of light often show the characteristics of low visibility, low contrast and high noise, due to limited imaging equipment. To address the problem, inspired by Retinex theory, we propose a two-stage method called Multi-Scale Illumination-Guided Low-Light Image Enhancement Network (denoted as MSIGEN). In the first stage, we employ an enhancement module to achieve low-light image enhancement, including a Decom-Net for decomposing into illumination and reflectance, an Adjust-Net for illumination adjustment, a Restore-Net for the illumination fusion map guided reflectance restoration. Secondly, the enhancement module is introduced to refine the initial images to effectively remove visual defects amplified and ensure the naturalness of the image. In addition, extensive experiment results demonstrate the advantages of our method and the effect of this method has reached the state-of-the-art.


Introduction
Images which are shot in low light conditions, usually contain low visibility, low contrast, artifacts, unexpected noise defects. Low-light images lose details, which lead to many problems, such as unpleasant feelings, the poor performance of many computer vision systems. So low-light image enhancement is an arduous and important task to handle many factors simultaneously and effectively.
In the past few years, although some equipment that can capture in low light conditions has been developed, the technology is not very mature and cannot be widely used. It is also worth noting that the core technology of these devices is still a low-light image enhancement algorithm.
The pioneering categories are histogram equalization [1] and Retinex theory [2]. The former increases the image contrast but it doesn't consider the global contrast in the entire image. Built on the latter, Single-scale Retinex (SSR) [3] and Multi-scale Retinex called MSRCR [4] constrain the light map to be smooth through a Gaussian filter, but the outputs usually look unnatural. To solve the problem of unnaturalness, a new method called NPE was proposed by Wang et al. [5] that can enhance the contrast and keep the illumination natural at the same time. Fu et al. proposed the SRIE method [6], using a weighted variation model to reckon the illumination and reflectance at the same time. Lore et al. [7] designed a method for low-light image enhancement (LLNet) based on deep self-encoding that can simultaneously achieve contrast enhancement and denoising. Then a deep network denoted as Retinex-Net [8] [9]), but the different effects of noise are not good. Zhang et al. [10] proposed a network (KinD) which achieve the transform of different light/exposure conditions. Therefore, in this article, we propose a two-stage enhancement method, which is an adaptive illumination compensation method, to achieve low-light image enhancement and color naturalness and denoising in the process of enhancement. On the basis of learning Retinex theory, in the decomposition step, the input image is decomposed into the reflectance and illumination. The illumination adjustment achieves to flexibly convert one light condition to another one. In the reflectance restoration step, multiscale illumination is introduced to adjust the reflectance. On the basis, a reinforce module is proposed to fine-tune the original enhanced image, which achieves denoising simultaneously and effectively to restore the image color.
In general, our contribution is reflected in three aspects: 1) We propose a two-stage method and a multi-scale illumination fusion network architecture to achieve reflection restoration. Guided by the illumination map, the proposed approach can achieve denoising simultaneously and effectively to restore the image color.
2) In our network, there is also a reinforce module, which can fine-tune the image to ensure the naturalness of the image and effectively remove visual defects amplified.
3) Extensive experimental results demonstrate the advantages of our method and our method has reached the state-of-the-art in terms of effect.
Figure1. Demonstration of our framework (MSIGEN). The proposed MSIGEN consists of two stages: (a) enhancement and (b) reinforce. The enhancement module includes three main steps: decomposition, illumination adjustment and reflectance restoration. Then, we propose a reinforce module to fine-tune the enhanced images, which is capable to validly decrease the visual unnaturalness and fine-tunes the image to ensure the naturalness of the image.

Proposed Method
Base on Retinex model, an image can be composed of the reflectance and illumination map as follows: where S represents the captured image, R represents the reflectance and I represents the illumination map, • represents element multiplication. Inspired by Retinex theory, we design a two-stage method called Multi-Scale Illumination-Guided Low-Light Image Enhancement Network (denoted as MSIGEN). As illustrated in Figure 1, our MSIGEN approach is made up of two components: enhancement and reinforce.

Stage-I: Enhancement
The enhancement module is introduced to achieve low-light image enhancement. As shown in Figure 1 (a), the enhancement module includes three main parts: decomposition, illumination adjustment and 3 reflectance restoration. In the first stage, we propose a decomposition module to decompose the input into illumination and reflectance, an adjustment module to flexibly convert one light condition to another one, a restoration module to solve the noise issues for the attention map guided reflectance restoration. Then, the adjusted illumination and reflectance are reproduced to the initial image.
Decomposition. This is an uncertain solution problem. Its solution needs additional prior, and the loss of image enhancement can be disintegrated into the form of four distance terms, as can be seen in formula (2): where, L represents reconstruction loss, L represents reflectance loss, and L represents illumination loss, L represents mutual consistency. In this paper, we should combine the illumination and reflectance to be the input, which is defined by the reconstruction loss L . We use the L norm to constrain the reconstruction loss, as can be seen in formula (3): Based on Retinex theory, we regularize the decomposed reflectance pair to be close. So the reflectance similarity is constrained by: where ∇ denotes the gradient including ∇ (horizontal) and ∇ (vertical) directions. We can preserve strong correlation edge information and depress weak correlation edge information through the mutual consistency loss. The loss well fits the mutual consistency, and the aim is to preserve edge information, which is constrained by: Illumination adjustment. In this step, the enhancement module calculates the paired illuminations' ratio of strength, i.e. α L /L . The ratio of strength α is defined to make the illumination condition from a low light L to a target one L . Considering that the output image L should be similar to the target image and the similarity between their edges, the following is the loss, as can be seen in formula (7): Reflectance restoration. Firstly, each image is processed by a U-net branch with the same architecture. We add skip connections in the U-net to aid the reconstruction of details at different scales. Secondly, we propose fusion blocks to combine the illumination map to fully exploit valuable information in a complementary way (see Figure2). The fusion blocks are build up a permutationinvariant technique with more aggregation operations between features. The input of the fusion blocks is the output of each layer and illumination map. Each fusion block takes different kernel sizes to combine illumination features into each layer. In this part, due to the reflectance from strong-light image is less disturbed than the low. We use a clearer reflectance as ground truth that is the reflectance from strong-light image. The following is the restoration function loss, as follows: ꞏ) is the structural similarity measurement, and R corresponds to the output of the module( stored reflectance).

Stage-II: Reinforce
The motivation is to refine the initial images to validly change the visual unnaturalness and improve the details. As shown in Figure 1 (a), the Reinforce module includes two subnets: an Estimation-Net and a Reinforce-Net. The Estimation -Net is designed to estimate the noise map x which guides the method. The Reinforce-Net is designed to contrast re-enhancement to remove visual defects amplified and change the visual unnaturalness. We use the output of reflectance restoration x as the input. We directly adopt FCN in our implementation in the Estimation-Net. Each Conv layer deploys the ReLU nonlinearity [12] except the last one. As for Reinforce-Net, we adopt a U-Net [11] architecture, which takes both the x and the σ x as input to give a prediction of the normal image. We adopt the skip structure to connect features, convolutions with different strides and transpose convolutions to exploit multi-scale information.
In this module, we propose a loss function to fine-tune the image, which is expressed as: Where ‖x x‖ means the reconstruction loss, λ denote the TV regularizer. ‖∇ σ x ‖ ∇ σ x denote that σ x becomes more smooth.

Implementation Details
The whole network is trained on a Nvidia GTX 1060 GPU and Intel Core i5-8400 2.80GHz CPU with Tensorflow [13]. We train our network on the LOL dataset and use the SGD technique to optimize our results. For the Decom-net, we set the batch size 10 and patch-size 48 x48. While for the Restore-Net and Adjust-Net, we set the batch size 4 and patch-size 384x384. In experiments, we pretrain the reinforce network using LOL dataset as input and fine-tune the network using images generated by the enhancement module. For the reinforce-net , we set the batch size 32 and patch-size 128x128.

Quantitative Evaluation
As shown in Table 1, we compare our approach with different methods on LOL dataset. Besides, we also evaluate our method on LIME [8], NPE [5], and MEF [19] datasets as shown in Figure 4. We also adopt three metrics to evaluate image quality, which are PSNR, SSIM and NIQE [20]. As for PSNR and SSIM, the value is higher that means the image quality is better. However, as for NIQE, the lower value means the image quality is better. Figure 4. The results of NIQE on different datasets. From the results, we see that our MSIGEN is the lowest among all the other methods in LIME and NPE.

Conclusions
In this work, we propose a novel two-stage enhanced network to achieve low-light image enhancement, named MSIGEN. The enhancement module recovers well-exposed image details and decreases noise variance and color bias by regression. Also, we introduce a reinforce module that refine the initial images to effectively remove visual defects amplified and improve the details. Our experimental results have shown that our model performs better than the other approaches in PSNR and SSIM. The proposed method generates high-quality images with abundant textures and naturalness. In the next step, we plan to use our model to real-time processing, and design a more powerful network in real-time processing and apply the model to other enhancement tasks (e.g., low-light video enhancement)