AMBCR: Low-light image enhancement via attention guided multi-branch construction and Retinex theory

Due to different lighting environments and equipment limitations, low-light images have high noise, low contrast and unobvious colours. The main purpose of low-light image enhancement is to preserve the details and suppress noise as much as possible while improving the contrast of the image. Here, different networks are ﬁrst combined to construct a multi-branch module for features extraction, and use the module and Retinex theory to extract the reﬂection map of the image. Then an attention mechanism is introduced into the multi-branch construction to balance the feature weight of each branch, and get the ﬁnal result by the reconstruction module. The Retinex theory is used to calculate the L 1 loss and the gradient loss for the intermediate feature map of the entire model to train our framework. The entire process is completed in an end-to-end-way, which avoids the hand-crafted reconstruction rules and reduces the workload. What’s more, a large number of experiments demonstrate that the proposed framework performs better results than state-of-the-art algorithms in both quantitative and qualitative evaluations of image enhancement.


INTRODUCTION
Image enhancement [1][2][3][4] is a major research hotspot of interest within the various computer vision tasks [5,6]. Very often, partial darkness or even global darkness are likely to exist in photos taken by the camera for some reasons, such as weak light, low exposure and backlight. It is difficult to take high-quality images in an environment with insufficient light, although we can adjust the flash and exposure time of the camera to increase the brightness, and the disadvantages of obtaining images via the operations are also unacceptable. For instance, long-time exposure will cause blurred edges, flashlight will introduce uneven shadows and overexposure. It is crucial to obtain clear images in some artificial intelligence tasks such as face recognition [7,8], target detection [9][10][11], image fusion [12] and autonomous driving [11,13].
In addition, denoising is also an important step to be considered for the low-light image enhancement [14,15]. Many general methods of image enhancement could inevitably amplify  [16][17][18][19][20]. In recent years, a large number of models have focused on improving the contrast and brightness of the images [19][20][21][22][23][24], and it also has a good effect on colour restoration. The existing methods are mainly divided into histogram equalization-based [19,20,24] methods and Retinex theory [18,[25][26][27]. The main goal of the former is to enhance the contrast of the image, and the latter focuses on estimating the illumination component in order to obtain the reflection component of the image. Although significant effects have been made, there are still many weak points that need to be improved. For recognition problem, say contrast enhancement, the details are still not well distinguished due to the amplification of noise even though the brightness is increased. With the improvement of camera equipment, it is more and more important to get a clear image under a certain degree of contrast for our visual experience. However, the degree of contrast is not easy to estimate.
The methods based on deep learning have made great achievements in computer vision tasks [18,[27][28][29]. Especially in the tasks of semantic segmentation [30], super-resolution reconstruction [31], target recognition [9] and image denoising [32], more and more excellent models have shown top-level performance. In the field of low-light image enhancement, the existing methods often use a single model to extract feature information; Models based on Retinex theory [33] are usually designed as non-end-to-end models.
Most methods based on Retinex theory used an independent network to decompose the low-light image, and the algorithms of deep learning are often non-end-to-end. Deep UPE focuses on finding the mapping relationship between low-light images and enhanced images and obtains the enhanced image by multiplying the low-light image by the inverse of the illumination mapping, so as to remove he low-level illumination information from the low-illumination image. In contrast, considering the complexity of the image content, we proposed an end-toend network via attention guided multi-branch construction and Retinex theory (AMBCR) to enhance brightness and preserve the details of a low-light image. In particular, the structure is an end-to-end network, which makes training more convenient.
In this paper, the network sends features in different layers of the image to different branches. Meanwhile, we introduce an attention mechanism to guide the fusion strategy of each branch, and obtain the illumination map containing lowfrequency information and the reflection map with detailed information. Next, we fuse feature information to enhance the image. In this manner, the AMBCR is capable of improving the contrast and details of the image, and getting an excellent result.
This study proposes a feasible framework of low-light image enhancement. Our codes will be open 1 . Our main contributions can be summarized as follows.
1. We designed a novel network of low-light image enhancement, which is constructed as an end-to-end model. We use hybrid datasets to train a more robust model. The performance of the model is improved by calculating the loss of the middle layer. 2. We proposed a multi-branch network combined with attention mechanism, and different features are extracted from different network. Then the model reconstructs the features from different branches according to different weights. Thus, the proposed method gets superior performance compared to other algorithms. 3. By combining the Retinex theory and multi-branch network, the proposed method is compared with many state-of-theart methods via comprehensive experiments, and proved the effectiveness of the methods.

RELATED WORKS
Low-light image enhancement has been the main focused area in the field of computer vision. In this section, we mainly introduce some similar methods. 1 The link of the codes is https://github.com/limiaoair/AMBCR.

Conventional methods
The traditional methods are mainly based on two aspects. The first aspect is based on histogram equalization (HE). This method is mainly to improve the image contrast. Ibrahim and Kong [19] enhanced image contrast with dynamic histogram equalization; Pizer et al. [20] optimized the local contrast of the low-light image, it divided the image into each sub-region, and then histogram equalization was performed on each region. Pisano et al. [24] could effectively reduce the noise by limiting contrast. Other traditional methods are based on Retinex theory. The Retinex theory is mainly used to estimate the illumination map of the images. Single-Scale Retinex (SSR) [34] tried to calculate the convolution of the source image and Gaussian function to estimate the illumination map, and had made a breakthrough in the field of low-light image enhancement; Multi-Scale Retinex (MSR) [25] extended the single-scale algorithm to a multi-scale field based on SSR; Multi-Scale Retinex with Colour Restoration (MSRCR) [25] introduced a colour restoration factor to solve the problem of colour distortion which was caused by local enhancement of contrast. Wang et al. [35] balanced the two tasks of detail restoration and naturalness preservation, and it prevented the influence of over-exposure; Fu et al. [36] blended the superiority from different techniques, and designed a multi-scale fusion model for image enhancement; Fu et al. [37] decomposed the image into a reflection map and illumination map via a weighted variational network. In addition, Guo et al. [17] proposed a model which estimated the illumination map according to the maximum value of the corresponding pixel in each channel of the image; Ying et al. [22] designed a multi-exposure fusion framework to achieve the task of low-light image enhancement. In general, traditional methods had enhanced the contrast of image, and restored the colour, but the details of the image were not well restored. They had obvious shortcomings in eliminating noise.

Deep learning-based methods
With the advent of neural network, a large number of tasks in the field of computer vision have been greatly developed, such as Refs. [31,38] for super-resolution, Tian et al. [32] for image denoising and Hou et al. [39] for infrared and visible image fusion, etc. These methods have achieved excellent results. LLNet [40] designed a novel stacked-sparse denoising autoencoder for the enhancement of low-light image, the denoising and contrast enhancement are achieved in only one network. In addition, S-LLNet [40] divided the two tasks into two modules, namely, the denoising module and the contrast enhancement module. LLCNN [41] proposed a novel deep convolution module containing inception part [42] and residual part [43] to avoid gradient vanishing problem. MSR-Net [26] combined convolutional neural network (CNN) and Multi-Scale Retinex (MSR) [25] to achieve image enhancement. Based on Retinex Theory [33], Retinex-Net [18] separated a low-light image into two maps which are named reflectance map and illumination map, and then enhanced the illumination map.
LightenNet [44] estimated an illumination part from a non-endto-end network. In recent years, Lv et al. [45] established a multi-branch framework named MBLLEN for image enhancement. Chen et al. [46] introduced and used a large number of raw shortexposure low-light images to design an effective deep CNN framework. SICE [23] designed a model which trained multiexposure images to enhance a single low-light image. Based on RetinexNet, Zhang et al. [27] further designed a lowillumination image enhancement network named KinD that could artificially adjust the brightness, and there are three modules in KinD instead of an end-to-end model. Ren et al. [47] proposed a deep Hybrid network to restore the edge information. Wang et al. [48] designed an encoder network which could extract local features and global features, and the method which is named DeepUPE mainly estimated the illumination map in the middle layer of the network. Jiang et al. [29] established a colour-based attention model for low-light image enhancement. In the field of unsupervised learning, Enlighten-GAN [28] incorporated generative adversarial network (GAN) [49] with low-light image enhancement to solve the problem of image enhancement. Lee et al. [50] proposed an unsupervised approach using bright channel prior theory. Zero-DCE [21] estimated pixel-wise and high-order curves to restore the low-light images. Zhang et al. [51] proposed a self-supervised model and only used one low-light image to complete the training process.

Low-light image datasets
Since the dataset has been widely used for deep learning, many famous datasets have been established. For example, the PAS-CAL VOC [52] contains a large number of image objects, and the ImageNet [53] is widely used for image classification and recognition. However, low-light image dataset is relatively lacking, it is very important to construct a corresponding image dataset in the field of low-light image enhancement. There are two main methods to obtain the low-light image dataset. The first method simulated low-light image by processing the source images [26,40,41]. The other is to build an additional dataset, VV-datasets [54,55] and LIME [17] have dozens of low-light images. Wei et al. [18] established a dataset with large scale paired low-light/normal-light images named LOL-dataset and the dataset is the first to be shot using real scenes. ExDark [56] is the largest low-light image dataset for object focused works which contains annotation. Chen et al. [46] collected a large number of raw images by cameras. The limitation of a single dataset is that the scenario is too simple. In this paper, we have mixed the different datasets to improve the robustness of the model. Experiments show that our model has strong generalization.

METHODOLOGY
In this section, we introduce the proposed network with all the required details, and establish a novel end-to-end network architecture to solve the problem of low-light image enhancement, and perform the decomposed module using the Retinex theory. In addition, our reconstruction module is also included in the AMBCR.

Network architecture
We design the AMBCR with the Retinex theory and multibranch network. It separates the image into sub-feature related to different feature layers. In the network, a low-illumination colour image is inputted, and the output is a clean image with the same size as the input image. The AMBCR architecture is illustrated in Figure 1, and consists of two parts: decomposition net and reconstruction net. In the next subsections, we shall describe them in detail.

Decomposed net
The decomposed method is mainly based on Retinex theory. In Figure 1, the low-light image is denoted as L-input, and the image is input into the dual channel simultaneously. The two channels are the reflection map extraction module (REM) and the illumination map extraction module (IEM). Inspired by the multi-branch network of MBLLEN [45], the REM attempts to design an attention guided module.

REM
The first part of the REM, there is a single stream network which has 6 convolutional layers, and except for the first two convolutional layers, the input and output features in the remaining four convolutional layers have the same size. Each of the convolutional layers uses kernels of the size 3×3 and ReLU [57] nonlinearity. The sub-features of the last four layers are used as the input of the multi-branch network. The second part of REM includes a multi-branch network and two attention modules. The multi-branch network contains four sub-nets which correspond to the above mentioned four subfeatures. The sub-nets consist of two U-net [30], a residual network and a dilated convolutional network. At the front and the rear end of the multi-branch network, there are two attention modules which consist of four SE-nets corresponding to four sub-nets.

IEM
The IEM is used to extract the illumination map, which is composed of a series of convolutional layers. The first three convolutional layers are used to extract the input image into the feature with kernel numbers of 32. After connecting the output feature of REM, a single-channel output is obtained by another three convolutional layers of which the last layer uses sigmoid nonlinearity.

Reconstruction net
The reconstruction part is mainly composed of convolutional layers. It accepts the outputs of REM and IEM to obtain the

Loss function
In this section, we design the loss function which aims to achieve the work of the low-light image enhancement and find suitable parameters to sufficiently express the coefficient of the loss function. We plan to design a robust loss function, but in this case, it is insufficient for our network to use the simple mean square error (MSE). Therefore, we establish a novel compound loss function.
The loss L-sum consists of two parts: adversarial decomposition loss and reconstruction loss: where denotes the coefficient to adjust the impact of decomposition loss on result, L decom uses the intermediate feature map of the model to calculate the loss and it makes the network as far as possible to get a better reflection map based on the Retinex theory. On the other hand, L recon can always make the model reconstructing and image enhancing more preferably.

Reconstruction loss
We build the reconstruction loss function as the main loss of the model, and it contains three parts: gradient loss, SSIM [16] loss and MSE loss. The objective is as follows: where, MSE loss is the l 2 norm, and SSIM loss should be able to express the difference in structure, the loss function of the first two parts can be expressed as: where, the subscript k represents the different colour channels of the image. L k r represents the channel with number k of the reflection map of the enhanced image, R k high denotes the channel with number k of the reflection map of the high-light image. The SSIM is a structural loss function, which compares the brightness, contrast, and structure of the two images to obtain an objective evaluation standard. The formula is as follows: where, , and xy denote the mean, standard deviation and correlation between x and y, respectively. In addition, the third part is gradient loss, which is mainly used for restoring details of the image. The loss function of gradient is as follows.
where ∇L x , ∇L y are the gradients in the x and y dimension of the grayscale of the enhanced image, similarly,∇L x and ∇L y are the gradient of the label. ∇ is the first order derivative operator, and the subscripts represent the horizontal and vertical directions respectively.

Decomposition loss
Since the purpose of the decomposition loss is to estimate the illumination map via Retinex theory, the focus of the loss function is to get a better illumination map. Therefore, we built a composite loss function which was mainly designed for obtaining the content of the illumination map. The loss function is still composed of three parts: where, the purpose of L Retinex is to make the obtained illumination and reflection map conform to the theoretical results of Retinex theory as much as possible. The mathematical model of the Retinex theory is formulated as: where, R(x, y) is a reflection map and I (x, y) is an illumination map. As a result, the loss function turns out to be: L Retinex = RI − L input 1 . In addition, in order to improve the effect of the illumination map, we use the last two parts of the loss function to penalize the illumination map. The gradient loss is the illumination smoothness loss, which is constrained as follows: where, ⋅ 1 means the L 1 loss, and the first part of the gradient loss is the gradient loss between the illumination maps which obtained by IEM module by the low-light image and the high-light image respectively (see Figure 2).
Subsequently, the gradient loss of the second part compares the relationship between the reflection map components of different inputs, which are the low-light image and high-light image. The specific relationship is shown in Figure 2.

EXPERIMENTS
In this part, a large number of experiments are used to verify the effectiveness of the AMBCR-net. Moreover, we not only use the gradient loss between the illumination map and the input image to validate the model. Next, we demonstrate the results of comparison between the proposed network and the state-ofthe-arts methods.

Implementation details
We use hybrid datasets as the training dataset, which includes LOL-dataset and SICE-dataset. In the training, we employ 480 image pairs of LOL-dataset, and 20 image pairs of SICE-dataset respectively. The training images are only cropped to a uniform size without additional artificial operations. In general, we use a total of 500 pairs of images as the training dataset. For the proposed network, the batch size is set to be 2 and the patch size to be 256×256. We set the parameters as: = 0.1, = 0.2, = = 1. Our network was trained for 350 epochs, and the model was minimized by the Adam optimizer which has a learning rate of 10 −4 . The AMBCR was implemented on Tensorflow 1.15 framework [58] and the entire model was trained on Nvidia GTX2080Ti GPU, Intel Core i7-9700KF 3.6 GHz CPU and 32G RAM.

Qualitative results
In this section, lots of experiments are used to verify the effects of our model. Our enhancement framework is compared with the traditional methods and the deep learning methods. The traditional methods include Dong [59], Illumination Map Estimation (LIME) [17], Exposure Fusion Framework (BIMEF) [22] and Naturalness Preserved Enhancement Algorithm (NPE) [35]. The deep learning methods include RetinexNet [18], KinD [27], MBLLEN [45] and a curve estimation model based on zero-reference (Zero-DCE) [21]. Moreover, we used two public datasets, which include LOL-dataset [18] and SICE-dataset [23], and the codes of the baseline algorithms are from the original authors. We first show the results of the experiment, and then compare the subjective and objective analysis. The enhanced image is initially shown in Figure 3. We concentrate on the contrast, the clarity of detail and the degree of colour restoration of each result. Because the brightness is more dependent on the perception of the supervisor, the image brightness is still lower for some labels of LOL datasets. Our method enhanced the brightness of the low-light image well, and got a larger improvement in the visualization than the lowlight image. Figure 4 is the histogram of Figure 3.
In addition to contrast enhancement, the results are shown in Figure 5. Figure 5(j) shows the normal-light image which is used as a comparison of the details, detail parts used for comparison are denoted with the white and red dotted box. As the method of Dong [59] loses some details due to only processing the reversed image by using the dehazing method, the result is shown in Figure 5(a).
And the result of the MF [36] is better than the Dong [59], but the details are still fuzzy. Although optimized the bright channel part, the results of LIME [17] and NPE [35] still tend to be darker and blurry, for instance, the contour details of plant in the white box are a bit unclear. In contrast, the result by BIMEF [22] has better details by using a multi-exposure fusion framework, but the enhancement of brightness is still not enough. We note that the results obtained by KinD [27] and MELLEN [45] reduce the impact of noise. The main reason is that these two methods use neural networks for supervised learning. But the details of the MBLLEN [45] are also still unclear, and our results are brighter than KinD [27].
In the "Toy" image in Figure 6, we found that in the magnified image marked by the white frame, our result has a clearer texture than other methods. The texture of LIME is also relatively clear, but there is a lot of noise in the result of LIME. Compared with other deep learning methods, our method can improve visibility and brightness while preserving the colour of the image.
What's more, a more detailed comparison experiment is shown in Figure 7. And we selected five images for comparison. The low-light and high-light images are displayed in the first and last row of the results in Figure 7. The results show in Figure 7 that our model not only is better than other classic algorithms, but also has excellent visual and contrast effects compared with advanced models of deep learning such as KinD [27] and MBLLEN [45].
We also conducted user studies (US) to evaluate the visual quality of enhanced images obtained by different methods. We selected 20 participants to rate enhanced images using FIGURE 4 Grayscale histogram corresponding to the image in Figure 3. The first row shows the grayscale histogram of "Swimming pool", the second rows shows the grayscale histogram of "Toy"

FIGURE 5
Visual comparison with state-of-the-art image enhancement methods in "Swimming pool"

FIGURE 6
Visual comparison with state-of-the-art image enhancement methods in "Toy" different methods. 80% of the participants are engaged in computer vision research, and the remaining 4 participants are engaged in computer science and automation science and other related industries. We mainly grade according to the following three points: (1) whether there exists obvious colour deviation; (2) whether the results lead to overexposure or underexposure; (3) whether the noise is newly generated. The scores of user study range from 1 to 5, and the higher the score, the better the subjective quality of the enhanced images is. We scored all the test images in the LOL dataset and the average scores of user study for LOL-dataset are shown in Table 5.

Quantitative results
The subjective visual evaluation is easily too affected by personal emotion. For individuals, identifying the specific details and potential noise of the enhanced image is extremely difficult. Therefore, it is indispensable using quantitative evaluations. We utilize five reliable quantitative metrics, including peak signal to noise ratio (PSNR), signal to noise ratio (SNR), structural similarity (SSIM), mean square error (MSE), visual information (VIF) [60] and natural image quality evaluator (NIQE) [61]. The first three indexes are evaluation indicators with reference, which need to use the high-light image as labels, while NIQE is a non-reference evaluation index and only a single test image is required. We perform quantitative analysis on the test images, as shown in Tables 1-3. Table 1 reports the performance of the enhanced results of nine methods including our algorithm on the five images in Figure 6. From the numerical results, we can see that most of the indexes of our AMBCR are significantly better than the     Table 4. The best results are marked in bold. Table 4 compares the quantitative assessment of different methods using six indexes. PSNR and MSE are unified indicators, KinD [27] and MBLLEN [45] of deep learning methods have better performance. The main reason is that the ability to extract features of neural networks is greatly improved compared to traditional methods. In SSIM, the value closer to one means better enhancement. In addition, we have a huge advantage in SNR, the smaller the value is, the smaller the noise proportion is. What's more, we have good results in the other listed indexes.
For further analysis, the colour restoration effect of our result is closer to the high-light image in Figure 7. As for the index of contrast, the results of KinD [27] and MBLLEN [45] are some-what the same, but our results are more effective in extreme dark parts of the images. The comprehensive comparison results are shown in Figures 8-9. As shown in Figure 8(c), our method has a higher signal-to-noise ratio (SNR), which has huge advantages compared with other methods. In addition, the number of NIQE [61] also has the same performance as the advanced algorithm. Figure 9 illustrates the average results of the test images. It can be seen that the deep learning methods have obvious advantages in quantitative indicators of PSNR and SSIM, and our enhanced image has achieved better results. The right side of Figure 9 shows the radar chart of the PSNR with five test images. Our results have good advantages. Table 5 shows the average performance indicators of all test images for our method and other methods on the LOL-dataset. For the user study (US) score, our method is slightly ahead, and KinD and LIME also have the advantage in visual quality. In addition, we compared the running time. The running time of our method is 0.04 s per image, which is slightly lower than the time of Zero-DCE and meets the requirements of real-time processing.

Extended experiment
We also used the dataset provided by SICE [23] to verify the effectiveness of our model. The SICE-dataset is a multiexposure image dataset containing 229 sets of multi-exposure images and each set contains labels. We selected the lowest exposure image in each group, and the comparison results are shown in Figure 10.
The results are shown in the first row of Figure 10. The contrast of the enhanced image has been significantly improved, and our results are closer to the label images. In the results of the second and third rows, the enhanced image of BIMEF [22] and Dong [59] have lower contrast, and focus on the part marked by the red box in the third row, details in results of MF [36], Dong [59], BIMEF [22] and LIME [17] looks so blurry that we cannot distinguish, the results of KinD [27] and MBLLEN [45] are closest to our result, but our enhanced image has better quantitative indicators, and results are shown in Table 6. Then, we calculated the difference of the illumination map obtained by the middle layers of our model under different exposure rates.
As shown in Figure 11, we listed the colour-map to show the distribution of the illumination map. It can be seen that the AMBCR is able to effectively extract illumination information under various exposure conditions. With the increase of the exposure intensity, the brightness of the extracted illumination map also increased, which can demonstrate the effectiveness of the intermediate loss L decom .
In the colour-map and illumination map, we can find that the definition of detail gradually increased with the exposure rate. In the problem of low-light image enhancement, the illumination map we obtained is more accurate and contains less information than the reflection map.   To better show the performance of the different methods, Figure 12 shows the visualization results of the enhanced images obtained by the different methods. Image details in most regions are revealed, but in order to determine the saturation of the region, we set the binary mask visualization based on an adjustable threshold. First, we convert the enhanced image into a grayscale image, and then adjust the threshold to obtain a binary image. Our purpose is to distinguish between over-saturated and under-saturated regions. On the one hand, for over-saturated regions, the pixel value higher than 200 is assigned as white, otherwise black. On the other hand, for under-saturated regions, the pixel value less than 50 is assigned as black, otherwise white. The results are shown in Figures 13 and 14.
In Figures 13 and 14, we focus on the high-light part and low-light part of the enhanced images. We set the specified pixel of the enhanced images to 1 or 0 by using the form of binary figure, Figures 13 and 14 can see clearly the local highlight areas and dark areas in the image. The edges of enhanced images can be compared well.

Video enhancement
In order to verify the validity of our model, we added an experiment of video enhancement. We choose GOT-10K [62] dataset and use specific software to reduce the contrast and the brightness of the videos. We tested the model on the low-light video and the results were shown in Figure 15. In Figure 15, we selected the key frames for subjective analysis, and our results have a good effect on enhancing brightness and restoring details.

Parameter selection
In order to ensure the optimal performance of the enhanced model, we need to compare the different performance of the different parameters further. Then, we changed the proportion of the intermediate loss, which was denoted by λ, we set it to 0, 0.1, 0.2 and 0.3. In addition, we also discuss the role of the attention module in the FEM, and compare the results of our model with the same one without the attention module. We choose the LOL-dataset [18] for analysis in Tables 7-8 and Figure 16. As shown in the Tables 7-8, our method got better results when˘= 0.1, and only one image in PSNR is slightly lower than The enhanced results of multi-exposure images. The first row shows the source images. The second, third and fourth rows are the reflection map, the illumination map and the colourmap of the illumination map, respectively

FIGURE 14
Binary mask visualization of Figure 13 (Intensity less than 50 would be assigned 1, otherwise 0)

FIGURE 15
Visualresultsof of GOT-10K video dataset the best (25.4260 vs 26.6463), the SSIM of the one image is not the best, but the gap is smaller than the best results (0.8985 vs 0.8957 and 0.9002 vs 0.8992). Figure 16 can clearly depict the comparison results.

Structure ablation
The results of AMBCR are obviously better than the model without attention module, and Figure 16(a,b,c) shows the results of quantitative indicators under different parameters, the comprehensive performance of˘= 0.1 is the best in Figure 16(b), and the effect of each image is relatively small. In addition, we discussed the impact of the different branches on the results. We respectively removed one of the four branches and then trained the model. What's more, in order to prove that more branches will not bring better results, we doubled the original number of branches to an eight-branch network. The final comparison results and spatial structure of the different experiments are shown in Figure 17. In Figure 17, we can see that the final results of the model, which include four branches, are closer to the high-light images in the spatial structure. Table 9 shows the average indicators of all test images in LOL-dataset [18] using different structures.

CONCLUSION
In this paper, we design a novel end-to-end network of low-light image enhancement. We combine Retinex theory and multibranch network for training. Our network consists of three modules, namely, the REM, IEM and reconstruction module. Our method uses a mixed loss function, and especially, we use the intermediate loss to verify the effect of our model. A large number of experiments prove that we can extract illumination maps to enhance the low-light images well. On this basis, we can perform further processing on the enhanced images, such as face recognition, target detection and target tracking. According to the Retinex theory, this model is helpful for multi-exposure images fusion.