Deep Learning-Enabled Variational Optimization Method for Image Dehazing in Maritime Intelligent Transportation Systems

Image dehazing has become a fundamental problem of common concern in computer vision-driven maritime intelligent transportation systems (ITS). +e purpose of image dehazing is to reconstruct the latent haze-free image from its observed hazy version. It is well known that the accurate estimation of transmission map plays a vital role in image dehazing. In this work, the coarse transmission map is firstly estimated using a robust fusion-based strategy. A unified optimization framework is then proposed to estimate the refined transmission map and latent sharp image simultaneously. +e resulting constrained minimization model is solved using a two-step optimization algorithm. To further enhance dehazing performance, the solutions of subproblems obtained in this optimization algorithm are equivalent to deep learning-based image denoising. Due to the powerful representation ability, the proposed method can accurately and robustly estimate the transmission map and latent sharp image. Numerous experiments on both synthetic and realistic datasets have been performed to compare ourmethod with several state-ofthe-art dehazing methods. Dehazing results have demonstrated the proposed method’s superior imaging performance in terms of both quantitative and qualitative evaluations. +e enhanced imaging quality is beneficial for practical applications in maritime ITS, for example, vessel detection, recognition, and tracking.


Introduction
Maritime video surveillance system has always been an indispensable part of maritime supervision. Affected by small droplets in the air, the image captured by the maritime surveillance system's imaging equipment always tends to be of low quality. ere are phenomena such as poor visibility, edge degradation, color distortion, and texture distortion.
is negative impact will directly affect the implementation of advanced vision tasks, such as ship detection and tracking [1,2]. With vigorous computer vision development, many enhancement methods for maritime images have been proposed [3,4]. Since haze is easily generated in the maritime environment and seriously affects the visual effect, it is also necessary to research the dehazing of maritime images. Starting from the correlation of haze images, Tang et al. [5] discovered some features, including Local Max Contrast, Local Max Saturation, and Hue Disparity, different from dark channel prior (DCP). In [5], features that significantly impact the dehazing effect are obtained by comparison and selection. e model is constructed using these combined features and trained using the synthetic dataset to optimize the parameters. However, it easily leads to noise amplification and distortion when dealing with those haze images with relatively high concentration. By proposing a new prior named color attenuation prior, Zhu et al. [6] created a linear model to estimate the scene depth under the new prior. It solved the atmospheric scattering model to obtain potential clear images.
e algorithm's defect lies in that the estimation of the scene depth of hazy white images is biased, affecting the dehazing effect. Subsequently, many variationsbased image dehazing methods have been proposed [7][8][9][10]. Although these methods can perform well in some situations, they cannot robustly process maritime images due to huge texture structure differences.
Based on Tang et al. [5] and Zhu et al. [6], Cai et al. [11] constructed a convolutional neural network (CNN) called DehazeNet for learning the mapping relations between hazy images and their medium transmission maps. As the first successful usage of deep learning for image dehazing, DehazeNet is trained by building a training dataset with synthesized hazy images. e trained DehazeNet takes a hazy image as input and outputs its medium transmission map to acquire a haze-free result by a simple traditional method. Similar to DehazeNet, Ren et al. [12] developed a multiscale CNN (i.e., MSCNN) to learn the relations between hazy images and their transmission maps and synthesized an indoor dataset with different hazy images based on the NYU Depth dataset [13]. Zhao et al. [14] proposed a deep fully convolutional network for more accurate transmission estimation and developed a new outdoor synthetic training set. Compared with DehazeNet and MSCNN, the dehazing effect in [14] is improved, but it needs broad parameters, and the computational cost that results in the speed of dehaze is slow. Li et al. [15] reformulated the atmospheric scattering model and proposed a light-weight CNN, called AOD-NET. Unlike most previous indirect CNN-based works that first estimate medium transmission maps based on CNN and recover haze-free images by traditional physical methods, AOD-NET can directly dehazed images from their hazy ones. Motivated by image denoising, Du et al. [16] converted a dehazing problem into a denoising one and proposed a deep residue learning network to remove haze from hazy images. Chen et al. [17] proposed a gated context aggregation network for image dehazing and deraining and applied the smoothed dilated convolution to avoid the gridding artifacts. Most CNN-based methods leverage haze-free images to synthesize hazy datasets [17,18]. However, some researchers thought that it could not represent the data distribution of real hazy images correctly, and some deficiencies existed in those models trained with synthesized datasets.
Recently, generative adversarial net (GAN), proposed by Goodfellow et al. [19], has been proven potent in image-toimage translation. Typically, GAN includes two subnetworks, that is, generator and discriminator, which are adversarially trained at the same time to acquire expected results. Besides, GAN cannot rely on any synthetic hazy image pairs. Yang et al. [20] proposed an end-to-end disentangled dehazing network trained by unpaired hazy images and clean images and generated haze-free images. Zhang et al. [21] proposed a new dehazing architecture, called densely connected pyramid dehazing network (DCPDN), that could jointly learn to estimate transmission map, atmospheric light, and dehazed images. Furthermore, a joint discriminator within DCPDN was designed to optimize dehazed images and transmission maps. Suarez et al. [22] proposed a stacked conditional GAN to dehaze each color channel of RGB image independently and applied multiple loss functions to optimize the network over a conditional probabilistic model. Generally, GAN-based algorithms need extensive data to train, but the training is toughed and requires higher equipment requirements. Although many dehazing methods based on the physical model and deep learning have been proposed, these methods do not consider the characteristics of maritime images. It cannot be well applied to maritime supervision tasks. erefore, it is necessary to propose a method for enhancing the hazy maritime image.
In this work, our contribution can be described as follows: We propose a dehazing method based on the atmospheric scattering model. Specifically, we use a fusion strategy to estimate the transmission. en, a two-step optimization method based on deep learning is designed to optimize the transmission. e solution to the subproblem obtained by our proposed two-step optimization algorithm is equivalent to image denoising based on deep learning. Our method has the best performance in synthetic and real dehazing experiments compared with other methods. e remainder of this paper is organized into the following several sections. Section 2 briefly introduces the problem formulation related to image dehazing. e optimal estimation of the transmission map is presented in Section 3. Section 4 proposes the CNN-enabled variational optimization method and its numerical optimization algorithm. Numerous experiments on synthetic and realistic datasets are performed in Section 5. Finally, we end this paper by summarizing the main contributions in Section 6.

DCP.
Based on the statistical analysis of massive images, He et al. [23] discovered the dark channel phenomenon and proposed the DCP. DCP believes that some pixel values in the nonsky local region of any clear image are always low in a specific color channel, even approaching zero. Figure 1 shows various maritime images and corresponding dark channel images. It is evident that the dark channel value in most regions tends to zero. In this work, the mathematical expression of DCP can be written as where J and J dark , respectively, indicate the outdoor hazefree image and the corresponding dark channel image, J c represents the single-channel image corresponding to J in the color channel c ∈ r, g, b , and Ω(x) is the local region centred on the pixel point x.

Image Dehazing.
To achieve image dehazing, we first describe the formation of hazy images. Narasimhan et al. [24] explained the hazy images' imaging process by establishing the mathematical model named atmospheric scattering model. is model can be exploited to describe the hazy image production process; that is, where I denotes the hazy image, t represents the transmission map, and A is the global atmospheric light value. Assuming that A is known, we can take minimization According to equation (1), we can approximate min c (min y∈Ω(x) J c (y)/A c ) ≈ 0. erefore, equation (3) can be rewritten as follows: where ω is an adjustment parameter, indicating the degree of dehazing of the image. e parameter's introduction can preserve a certain haze in the sky region to make the dehazed image more natural. e value of ω is determined by the actual situation. In general, a better result can be obtained with ω � 0.95.
In the atmospheric light value estimation, the traditional approach selects the pixel with the highest brightness in the sky region as the value of A. However, the pixels' inability with the highest brightness to be accurately distributed in the sky regions will result in a failed estimation of atmospheric light. erefore, He et al. [23] selected the pixels with the 0.1% highest brightness in the dark channel image of the hazy image and used the maximum value of these pixels corresponding to the hazy image as the estimated value of atmospheric light.
It can be seen from equation (3) that the transmission map obtained by DCP has the same value in the local regions, and the transmission changes significantly when the brightness changes suddenly. erefore, the image restored by this transmission map will have a blocking effect. To obtain a more refined transmission map, He et al. adopt the soft matting algorithm to optimize the initial transmission map. After obtaining the atmospheric light A and transmission map t, the potentially clear image can be restored according to the inverted atmospheric scattering model; that is, where t 0 represents the lower boundary of the transmission map. In our experiments, we set t 0 � 0.1 generally. However, the above dehaze method based on transmission map estimation needs to ensure that the DCP theory is valid. When the DCP theory fails, the inaccurate transmission map estimated will lead to the restored image's poor visual effects. eoretically, in the dark channel image corresponding to the clear maritime image, the sky and water regions' pixel value fails to approximate 0. erefore, it is unreasonable to directly estimate the transmission map of the hazy maritime image with DCP.

Optimal Estimation of Transmission Map
Images captured on the water usually contain large sky regions that generally do not satisfy the DCP. e conventional algorithm introduced in Section 2 is easy to lead to inaccurate transmission estimation. To improve the dehazing effect of the proposed method in haze image on the water with sky region, we also use the soft segmentation method [15] to correct the initial transmission map in this section.
Firstly, we estimate the haze image's initial transmission map on the water based on the DCP. Secondly, we use the soft segmentation method to process the initial transmission map to obtain the transmission weight map. irdly, we use the brightness model proposed in [25] to estimate the sky Journal of Advanced Transportation region's transmission map. Finally, we combine the transmission weight map to merge the transmission map of the nonsky region and the sky region to obtain the corrected initial transmission map. e revised initial transmission map in this section will help to calculate the joint optimization model.

Weight Function of Transmission Map.
As for a hazy image on the water I, we can calculate the initial transmission map based on the DCP mentioned in Section 2. e initial transmission map t d (x) can be given by where ω � 0.95 and Ω(x) represents a 21 × 21 region centred on x. e atmospheric light A can be estimated by the DCP directly. As shown in Figure 1, the sky region's values are obviously smaller than the values of other regions in the transmission map estimated by the DCP theory. erefore, we can roughly distinguish the sky region and other hazy images based on the transmission value in the initial transmission map. e methods proposed in [25] can obtain the possibility that each pixel of a hazy image belongs to the sky region and other regions, that is, transmission weight map, based on the above transmission map. In the transmission weight map, the pixels close to 0 are considered sky region, while pixels close to 1 are considered other regions. To distinguish the sky region and other hazy image regions more accurately, we use a sigmoid function to stretch the transmission. Specifically, the transmission weight function of the initial transmission is obtained as follows: where t d (x) is the initial transmission obtained using the DCP theory, θ 1 is the parameter to adjust the slope of the sigmoid curve, θ 2 is the centre of the horizontal coordinate set according to the t d (x) range, and θ 1 and θ 2 can be, respectively, given as follows: is section uses the sigmoid function to stretch the initial transmission to get the transmission weight map, shown in Figure 2. e soft segmentation method can easily distinguish the sky region and other regions. Because the water surface condition is complicated, the transmission weight map of some hazy image on the water obtained by the soft segmentation method may have deviations, which will result in insufficient transmission after mergence. erefore, the transmission after mergence is further optimized to obtain the optimal value.

Transmission Estimation with Sky Regions.
e transmission of the sky region estimated by the DCP theory is not accurate because the white regions (such as the sky) do not conform to the DCP theory, which will result in the failure of dehazing the hazy image on the water with a large sky region. In order to estimate the transmission of the sky region more accurately, we estimate the transmission of the sky region based on the brightness model.
According to the hazy image degradation model described in Section 2, the relationship between the transmission of the scene and the depth information of the object is as follows: where d represents the distance between the object and the imaging device. β is the dissipation coefficient of the medium. e above formula shows that the scene transmission could be estimated if the depth information of the image can be obtained. In [25], Zhu et al. found that the brightness distribution in HSL color space in the hazy image is usually related to depth information through the statistics of a large number of hazy images, and the brightness of the sky region is much larger than other regions. erefore, we can simulate the scene depth according to the brightness of the hazy image and then estimate the scene transmission as follows: where t L (x) is the transmission estimated from the brightness model, L(x) is the corrected brightness, and β is the dissipation coefficient of the medium. Different wavelengths of light have different dissipation coefficients under the same medium. According to the Mie Scattering Model, the dissipation coefficients of red, green, and blue light are taken as 0.3324, 0.34333, and 0.3502, respectively. To better simulate the real scene depth, the brightness is stretched as follows: where L(x) is the brightness of the hazy image, L * takes the value at the 95% quantile of brightness, and τ represents the depth range of the real scene. e value of τ can be selected according to the haze density in the hazy image. e greater the haze, the greater the value of τ, and vice versa. In this paper, τ � 3.4. By combining equations (11) and (12), we can see that the transmission obtained based on the brightness model is

Combination of Transmission
Map. e brightness model can simulate the depth of the sky region well but cannot accurately simulate the depth of other regions, which will result in the inaccurate transmission of other regions estimated. On the contrary, the initial transmission estimated based on DCP theory can better estimate the transmission of other regions. erefore, in this section, we merge the transmissions that are, respectively, estimated using the above methods through the transmission weight map to obtain a more accurate result. e corrected initial transmission t 0 is expressed as follows: where t d represents the initial transmission of other regions, t L represents the initial transmission of the sky region, w(x) represents the weight function of other regions' transmission, and 1 − w(x) represents the weight function of the sky region transmission.

e Unified Transmission Estimation and Image Dehazing
Framework. In the previous section, the weight map of transmission is obtained by the soft segmentation and fuses the initial transmission by the weight map. Because the soft segmentation method only uses the sigmoid function to stretch the initial transmission to obtain the weight map, transmission obtained by fusion of this weight map is still inaccurate. According to the process of image dehazing, accurate estimation of transmission is a crucial step for image dehazing to obtain satisfactory results. To estimate the transmittance as accurately as possible and restore the potential clear image, we proposed a joint optimization model that simultaneously optimizes the transmittance and the potential clear image within a unified framework. e joint optimization model is given by where J is the haze-free image to be restored, I is the captured hazy image, A is the atmospheric light, t is the transmission of hazy image, and t 0 represents the corrected initial transmission. I − Jt − A(1 − t) 2 2 and t − t 2 02 are data fidelity terms to constraint haze-free image and its transmission. φ(J) and ψ(t) are the regularization terms representing the prior information of J and t, respectively. α, λ 1 , and λ 2 are positive parameters.

Numerical Optimization.
Since the haze-free image J and the transmission t are independent, we tend to propose a two-step optimization method to decompose equation (15) into the following two subtasks.

Estimation of Haze-Free Image J.
If t k is fixed, the optimization solution of haze-free images is as follows:

Estimation of Transmission t.
In a similar way, if J k is fixed, transmission t optimization subtask is solved by e above two subtasks need to define appropriate regularization terms φ(J) and ψ(t) to be solved. In other words, the solution of the above subtasks is limited by the regularization terms φ(J) and ψ(t). Considering that deep learning has a strong prior learning ability, the deep learning method is used in this paper to solve these two subtasks. It can be found that the two subtasks have the same form. According to the Bayesian probability, the above two subtasks can be equivalent to Gaussian denoising tasks [26,27].

CNN-Based Blind
Denoising. In Section 4.2, a dehazing model is proposed to jointly optimize transmission and restore haze-free image, and the joint optimization model is converted into two subtasks. In this section, we mainly introduce a CNN-based blind denoising model to solve the above two subtasks.

Image Denoising
Model. e image noise model can be expressed by where y is observed noisy image, x is noise-free image to be restored, and v is white Gaussian noise (AWGN) with standard deviation σ. Because image denoising is an ill-posed inverse problem, previous works generally adopt the prior knowledge or regularization terms to constrain variables to solve the problem. In the Bayesian framework, equation (22) can be 6 Journal of Advanced Transportation solved by solving the following maximum posterior problem: x � arg max where log p(y|x) is log-likelihood of observed noise image and log p(x) is prior information of x. According to equation (23), recovering a high-quality noise-free image from a noise image can be regarded as a problem of minimizing the energy function as follows: x � arg min where 1/2y − x 2 is data fidelity term, Φ(x) is regularization item related to image prior information, and λ is positive parameter.

CBDNet Structure.
After decades of research, many image denoising methods have been proposed. ese methods can be divided into two categories: model optimization-based methods and supervised learning-based methods. Model optimization-based methods mainly include total variation, Gaussian mixture model, and BM3D. Most of these algorithms are computationally complex and time-consuming. Supervised learning-based methods mainly contain denoising algorithms based on deep learning such as DnCNN and FFDNet. is denoising method has the advantages of fast speed, excellent performance, and strong robustness [28]. Because deep learning has strong prior learning capabilities and deep learning-based denoising methods have better denoising performance, we adopt the convolutional blind denoising network (CBDNet) proposed by Guo et al. [29] to solve equations (20) and (21). CBDNet benefits from its two subnetworks and provides a blind denoising solution for image denoising. erefore, we use CBDNet as an optimization algorithm to eliminate haze of different concentrations. As shown in Figure 3, the CBDNet consists of two parts: the noise estimation subnetwork (CNN E ) and the nonblind denoising subnetwork (CNN D ). e noise estimation subnetwork firstly estimates the corresponding noise level image according to the noise image. e nonblind denoising subnetwork obtains the final denoising image according to the noise image and the estimated noise level image. e noise estimation subnetwork includes five fully convolutional layers. Each convolutional layer comprises 32 convolution kernels with size 3×3, followed by the ReLU nonlinear activation function. e nonblind denoising subnetwork is a 16-layer U-Net structure, and the input layer and output layer of the network are connected by skip connection to obtain final denoised images. Equation (20) is equivalent to denoising color images, while equation (21) is equivalent to grayscale images. erefore, two CBDNet models denoted as F Θ 1 (·) and F Θ 2 (·) need to be trained with the color dataset and the grayscale dataset, respectively. Θ 1 and Θ 2 are the parameters of the networks, respectively. e two CBDNet models share the same structure, except that their input and output channel are different.

Restoration of Latent Haze-Free Images.
In summary, the process of the joint optimization dehazing model is shown in Algorithm 1.
Although the haze image can be restored directly according to the proposed model, experiments show that the haze-free image restored with the optimized transmission has better visual effects in structure and texture. In this work, the coarse transmission map is firstly estimated using a robust fusion-based strategy. A unified optimization framework is then proposed to simultaneously estimate the refined transmission map and latent sharp image.
e resulting constrained minimization model is solved using a two-step optimization algorithm. To further enhance dehazing performance, the solutions of subproblems obtained in this optimization algorithm are equivalent to deep learning-based image denoising.
erefore, according to the model, we restore the hazefree image after obtaining the optimized transmittance. In summary, the flow of this algorithm is shown in Figure 4.

Experimental Datasets and Settings
5.1.1. Training Data. As mentioned above, we need to train two CBDNet denoising networks (i.e., F Θ 1 and F Θ 2 ) to remove the unwanted noise in the original image and transmission map, respectively.
Because the denoising network F Θ 1 mainly focuses on denoising color images, we choose the SeaShips dataset to build a training set for F Θ 1 . We randomly select 500 clear images that are high-quality and noise-free from the SeaShips dataset and synthesize noisy images with different noise levels according to equation (17). F Θ 2 aims to optimize the transmittance map, so the depth map in the NYU Depth dataset is used to make the corresponding training set. Similarly, we firstly select 500 depth maps from the NYU Depth dataset, use equation (17) to transform them into corresponding transmission maps, and then synthesize transmission maps with different noise levels according to equation (17). en, these synthesized images and their source images are cropped into many image blocks whose sizes are 128 × 128.

Experimental Settings.
We use the Adam Optimizer to optimize the network weight parameters in the network training stage. e batch size is set to 64, and the number of iterations (epoch) is set to 40. For the first 20 epochs, we set the learning rate to 10 − 3 , for the last 20 epochs, we set the learning rate to 10 − 4 . e network weight parameters are all initialized with a Gaussian distribution with a mean value of 0 and a variance of 0.01. e loss functions of the two denoising networks are shown in equations (20) and (21). e parameter settings in the Journal of Advanced Transportation algorithm are as follows: the maximum number of iterations of the joint optimization model maxiter � 6, and the adjustment parameter mathrm α � 0.5. e experiment is conducted in Python 3.7 environment with PyTorch package and an Ubuntu 18.04 system, Intel(R) Core(TM) i9-9900X processor, and NVIDIA GeForce RTX 2080Ti GPU. e training of a single model can be done in about one day. e loss function of the denoising network F Θ 1 is as follows: where F Θ 1 (J i ) is the potentially noise-free clear image, J i is the input noisy image, J * i is the corresponding original noise-free image, and q is the number of training images. (1) Input: haze image I, atmosphere light A, maximum number of iterations maxiter, adjustment parameter α。 8 Journal of Advanced Transportation e loss function of the denoising network F Θ 2 is as follows: where F Θ 2 (J i ) is output transmission map of F Θ 2 , t i is the input transmission map, and t * i is the noise-free transmission map of t i .

Experiments on Synthetic Datasets.
In this section, comparison and analysis are made between the proposed method and three classic image dehazing methods on synthetic images. To verify the effect of the proposed algorithm on removing different haze concentrations, we have selected three transmissions (i.e., t � 0.1, 0.3, 0.5), and three atmospheric light values (i.e., A � 0.7, 0.8, 0.9). A total of 9 kinds of different degrees of haze are synthesized on twelve test images shown in Figure 5 to validate the performance of the proposed method. As shown in Tables 1 and 2, we have calculated the PSNR and SSIM objective evaluation indicators of our proposed method and the other three methods. It can easily be seen that our method has the best performance in most cases, the hazy concentration is more extensive, and there are more obvious indicator differences.
As shown in Figures 6-8, we have conducted subjective visual analysis experiments. It is obvious that the sky area of the DCP [23] recovery results has different degrees of color distortion and artificial vignetting. e restoration results in RIVD [30] and MSCNN [12] still have a certain degree of haze, and some color distortions appeared in the sky and water. In contrast, the overall effect of the restoration results of the algorithm in this paper is the best; while maintaining the best visual effect, it virtually eliminates the haze.

Experiments on Realistic Datasets.
In this section, we choose some real hazy images to verify the superiority of our proposed method. Figure 9 shows the dehazing results of different dehazing methods on three maritime video surveillance images. It can be seen from the comparative experimental results in Figure 9 that DCP dehazing algorithm can effectively remove haze. However, at the same time, it also causes artifacts, blocking effects, and color distortion in the restoration results. RIVD and MSCNN dehazing algorithms have individual dehazing capabilities and can avoid artifacts, blocking effects, and so on. However, haze still exists in the restoration results. In contrast, the algorithm in this paper can effectively eliminate the haze, and the restoration results have richer colors, details, and other pieces of information.
e comparison experiment results in other on-water scenes containing hazy images in Figure 10 further prove the effectiveness of the proposed method. It can be seen from the experimental results in Figure 10 that the restoration results of DCP have noticeable artifacts, blocking effects, and color distortion. What is more, the restoration of the sky area is lacking. ere is still an apparent haze in RIVD and MSCNN restoration results, resulting in unclear details of some objects. Evidently, the proposed method has an excellent dehazing effect on both sky and nonsky areas. e detailed information of the restoration result is more affluent, and the color is more natural.
It can be seen from the above visual experiments that the proposed method can better recover potentially clear images from hazy images in different water scenes. e restored   Figure 5.    [23], (c) RIVD [30], (d) MSCNN [12], and (e) ours, respectively.
images have richer detailed information, which shows the effectiveness and stability of the proposed method.

Conclusion
Image dehazing is an important preprocessing problem, which has great practical value in various applications in maritime ITS. In this work, a deep learning-enabled variational optimization method is proposed to reconstruct the latent haze-free image from the observed hazy version. Compared to several competing dehazing techniques, the proposed method is capable of generating superior image restoration results in terms of visual image quality and qualitative evaluation. e main benefit of our method is that it takes full advantage of the unified dehazing framework and the strong representation ability of deep learning. In practical applications, the effectiveness and robustness of vessel detection, recognition, and tracking could be significantly enhanced with the enhanced image quality.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.