Single Sea Surface Image Dehazing via Multi-scale Concatenated Attention Network

Haze degrades obscures and content information of sea surface images, which can negatively impact the navigation safety of the intelligent ship in real-time systems. Therefore, it is imperative to develop algorithms that effectively remove haze and make the intelligent ship have higher environmental adaptability. In this paper, we propose a novel Multi-scale Concatenated Attention Network (MCA-Net) to address this problem. Specifically, we use the concatenation operation to connect multi-scale residual blocks (MRB), while introducing the channel attention mechanism for guiding the hazy region feature extracting. Extensive experiments on both synthetic datasets and real sea surface images, which demonstrate our proposed method outperforms are better than other recent dehazing approaches.


INTRODUCTION
Intelligent perception and analysis of navigational environment is widely used in technologies of intelligent navigation, which enables intelligent ships to realize autonomous navigation and obstacle avoidance through target detection, target identification and target tracking. The intelligent ship monitors the near scene through the surrounding environment information obtained from the visual imaging system, which mainly consists of visible light cameras. Due to the influence of bad weather factors, sea fog is often accompanied during sea voyages. The scenes taken by the camera are scattered by suspended particles, which greatly reduces the visibility and contrast and seriously affects the performance of the visual system of intelligent ships. Improving the clarity of images obtained in foggy environment is an important prerequisite for intelligent ships to realize technical means such as intelligent perception of surrounding targets and obstacles [1]. Thus, the research of single image dehazing technology for intelligent ships in sea fog environment is of great practical significance.
Mathematically, existing works [2,3] usually model a hazy image by where ( ) d x is the scene depth and  indicates the scattering coefficient of the atmosphere. As only the hazy image ( ) I x is available, the haze-free scene ( ) J x is an ill-posed problem. To make the problem well posed, the early priori-based methods attempted to use the statistical characteristics of images, such as dark channel prior ( DCP ) to estimate the transmission map and atmospheric light [4]. Although the physical methods work well in some cases, there are still unable to precisely estimating parameters, particularly in the case of unsuitable modeling.
In recent decades, learning-based methods is introduced by employing convolutional neural networks (CNN) to estimate transmissions or predict clear images. Cai et al. [5] proposed a simple but powerful network, named as DehazeNet, added feature extraction and non-linear regression layers to recover the image. Ren et al. [6] trained a Multi-Scale CNN, consists coarse and fine scale networks, to reconstruct the transmission map. AOD-Net [7] developed a K-estimation module that can combine the transmission map's and atmospheric scattering model's variables through a single convolutional network. Afterwards, the gated fusion network (GFN) [8] was presented by leveraging hand-selected pre-processing strategies and multi-scale estimation. Even though these methods have great performance, there are still various limitations in real cases.
In this paper, we propose an end-to-end Multi-scale Concatenated Attention Network (MCA-Net) for single image dehazing, which can directly output dehazed image without the estimation of intermediate parameters. Specifically, we use the concatenation operation to connect multi-scale residual blocks, while integrating the channel attention mechanism for guiding the hazy region feature extracting into the learning procedure. Compare with the related methods above, our methods produce the better dehazing results in the synthetic dataset as well as the real-world ones.

PROPOSED METHOD
In this section, we introduce concatenation framework firstly, describe each part as well as the loss function secondly.

Concatenation Framework
The overall architecture of MCA-Net is shown in Fig. 1

Multi-scale Residual Block
Multi-scale features are widely applied in all kinds of computer vision tasks. Multi-scale combining features have better representation in objects and their surroundings [9]. Therefore, a multi-scale residual block (MRB) is proposed, which is the connection between different scales of the feature map and the residual blocks, as shown in Fig. 2

Channel-wise Attention
Attention can improve the focus of salient parts in visual system, which is significance of capturing vision. Since diverse channel features include diverse weighted information, channel-wise attention [10] is introduced after the MRBs to obtain useful local features. Firstly, feature maps are fed into a channel descriptor based on global average pooling. x i j is the value of c-th channel at the location ( , ) i j . The global average pool operator collects the channel global information and changes the feature map. To obtain the weights of various channels, the Sigmod and ReLU activation functions are selected following with two convolution layers, which is given by:

Loss Function
For our network optimization, we choose the simple L1 loss by default.
where  denotes the network parameters, gt I and haze I stand for ground truth and original input image.

EXPERIMENTAL RESULTS
In this section, we first introduce datasets and experimental details. Then we quantitatively and qualitatively evaluate our method against other dehazing algorithms.

Datasets
We selected 1,000 images from the VOC 2007 dataset and BSD 500 dataset [11]. The sea surface images and other scenes images are included, so that the network can better understand the features of hazy images. In our experiment, the blue channel hazing algorithm is used to generating synthetic datasets. There is a logarithmic relationship between image depth and normalized blue channel, which can be defined by  where 1 B can be formed by removing the details of ( ) B x with guide filtering. The transmission map is linear with the blue channel, according to formula (2) and (5), and it can expressed as  When the clear image and transmission map are known, the hazy image can be synthesized by using formula (1). These image pars are randomly cut into 64×64 crops. In total, 500, 000 pairs of images were selected for training.

Training Settings
We perform the training process using PyTorch on an NVIDIA GTX 1080 GPU. For the accelerated training purpose, the Adam optimizer is used with a batch size of 8, where β1 and β2 take the default values of 0.9 and 0.999 respectively. The network is trained for 100 epochs and the learning rate is initially set to 0.001 and then is divided by 10 after each 25 epochs.

Result Evaluation
The proposed method is compared with three state dehazing methods: DCP [4], AOD-Net [7] and GFN [8]. As the availability of ground truth in synthetic data, we enable to evaluate the results using PSNR and SSIM metrics [12]. Table 1 Fig. 3. Furthermore, the generalization ability of our model on real images is assessed, as shown in Fig. 4. Overall, our proposed  method restores more details and obtains visually pleasing images, which is more suitable for marine application.