An Efficient Extreme-Exposure Image Fusion Method

Since the existing commercial imaging equipment cannot meet the requirements of high dynamic range, multi-exposure image fusion is an economical and fast method to implement HDR. However, the existing multi-exposure image fusion algorithms have the problems of long fusion time and large data storage. We propose an extreme exposure image fusion method based on deep learning. In this method, two extreme exposure image sequences are sent to the network, channel and spatial attention mechanisms are introduced to automatically learn and optimize the weights, and the optimal fusion weights are output. In addition, the model in this paper adopts real-value training and makes the output closer to the real value through a new custom loss function. Experimental results show that this method is superior to existing methods in both objective and subjective aspects.


Introduction
Dynamic range (DR) is called the ratio of maximum light intensity to minimum light intensity, which is 24 2 in natural scenes. However, the current commercial digital SLR camera (DSLR) can only capture scenes with a low dynamic range of 8 12 2 -2 . In other words, it is very difficult to record a good high dynamic range (HDR) with current imaging devices. This disadvantage will lead to underexposed or overexposed areas in the captured image. Therefore, in order to better record objects with different brightness, multi exposure image fusion (MEF) is a fast and economical way in the field of HDR imaging [1].
MEF algorithm is suitable for multiple image sequences with different exposure levels of the same scene. An excellent MEF algorithm is expected to capture some information from different exposure images and fuse these important perceptual information into an HDR image, which has the best visual effect and the best objective evaluation index [2].
Burtet al [3] proposed the classical fusion algorithm of image pyramid, which calculates the weight according to the correlation between local energy and pyramid. Goshtasby al[4] blocked the exposed image and select each block with the highest information. However, this is easily affected by block defects. At the same time, with the explosion of deep learning, more and more people use deep learning methods to study multi exposure image fusion tasks like Prabhakaret al [5] first proposed an unsupervised MEF learning framework, which uses SSIM (structural similarity) loss function to measure image quality. These methods greatly promote the development of multi exposure image fusion task, and provide method guidance for the extreme exposure image fusion method proposed in this paper.
The main work of this paper is as follows: (i). A new extreme-exposure image fusion model is designed, which greatly reduces the storage space and running time.

Related Work
The purpose of multi exposure image fusion is to generate well exposed images y ( ) f c from a group of images ( ) i c  with different exposure levels, which can be expressed as: represents the channel of color images. Different MEF algorithms are different in weight calculation.
In recent years, deep learning has made a great leap in various machine vision tasks like image classification [6], face recognition [7], semantic segmentation [8], object detection [9] and much more. Liet al [10]proposed a new Densefuse architecture, which fuses HDR image sequences for infrared and visible images. Ma al [11] proposed a fast MEF network method for static image sequences with arbitrary spatial resolution and exposure times. This method performs end-to-end training by optimizing the perceptual calibrated MEF structure similarity (MEF-SSIM) index on the database. In addition, a pilot filter is used to improve the performance. Zhang al [12] proposed a general image fusion framework ifcnn based on convolutional neural network. Inspired by the transform domain image fusion algorithm, two convolution layers are used to extract significant image features from multiple input images. Then, different fusion rules (elementwise max, elementwise min or elementwise mean) are selected according to the type of input image to fuse the convolution features of multiple input images.
Although deep learning based methods have been proposed, and their performance is better than traditional methods, these methods often require multiple image sequences with different exposure levels, which greatly increases the memory space and runtime, and the fusion effect is not ideal when there are only two extreme exposure images.

Proposed Method
In order to solve these problems, we propose an extreme exposure image fusion method. Only two extreme exposure images can output the best fusion weight and obtain a well exposed fusion image, which is more suitable for practical application.

Network Architecture
The model of this work is shown in Fig. 1. Firstly, two extremely exposed images are input into the network, the RGB three channels are separated and reorganized, and then input into the encoder densenet block module to fully extract the pixel features of the image. Then, the perceived importance of the features input to the simam attention module is weighted and sent to the decoding module network for further extraction and reconstruction, so as to obtain a well exposed HDR image.

Loss Function
In this model, we propose a new custom loss function for extreme exposure image fusion task to improve performance. The formula of multi-scale weighted loss function is as follows: loss loss 1000* 1 Where (x)=max(x,0.0001)  is a selection correction function used to increase the stability of the network structure.

SimAM Attention
We add a simple but effective convolutional neural network attention module, as shown in Figure 2, which is different from the existing channel and spatial attention module CBAM [13]. Without adding parameters to the original network, the module infers three-dimensional attention weights for the feature graphics in the layer, which improves the attention accuracy [14].

Eeperimental Details
In this part, we first introduce the details of data processing method and network training, which is a necessary preparation for the experiment. Secondly, based on the same image input of two extreme exposures, the model is compared with the classical and latest MEF methods in terms of visual quality and objective indicators.

Preparation
In this work, two extreme exposure images are selected from the SICE [15] common data set with images of different exposure levels to form the SICE-ex data set. In this paper, Adam optimizer is selected, the epoch is set to 12, the batch is 4, and the initial value of learning rate is 1e-4. The learning rate is corrected by 50% or 105% according to the training cycle.    [18], Lee18 [19], li20 [20] from subjective vision ( Fig. 3 and Fig. 4) to fusion duration, average gradient and graphic gradient, and Natural Image Quality Evaluator [21] as fllow Table 1.

Conclusion
This paper proposes an extreme exposed image fusion network which is completely different from MEF fusion algorithm. The performance of the network is improved by introducing spatial and channel double attention mechanism and user-defined loss function. On the premise of ensuring clear image structure and good exposure, the image storage space and fusion time are greatly saved, and the embeddability of the algorithm on mobile terminals is improved. A large number of experiments show that the fusion method is superior to existing methods in subjective vision and objective evaluation. However, due to the use of two extreme exposure images, the color, depth of field and contrast of fusion image are slightly inadequate compared with other multi-exposure image fusion algorithms, so the improvement of these directions will get more attention in the future.