SPIDE-Net: Spectral Prior-Based Image Dehazing and Enhancement Network

During hazy or foggy conditions, the acquired images are degraded and resulting in reduced visibility, contrast and color fidelity. This image degradation occurs due to atmospheric particles that attenuates and scatters the source radiations. The degradation intensity depends on diverse scenarios having variable densities of atmospheric particles, their wavelength and distance from acquisition device. Existing image dehazing methods for visible-band images are either based on prior assumptions to reconstruct the transmission map or used some learning mechanism to directly estimate the dehazed image. Recently, performance comparison of existing popular image dehazing methods using spectral hazy images are performed in which selected wavelength bands from different fog density levels are used for comparisons. The comparison results showed performance degradation of existing methods with wavelength bands selection and fog density levels. In this study, we design an effective spectral and prior based image dehazing and enhancement network (SPIDE-Net) showing better performance as compared to existing methods when using spectral hazy images from variable wavelength bands and fog density levels. Our SPIDE-Net consists of two networks:1) Spectral Image Dehazing Network (SID-Net), which is trained on multi-spectral hazy images between 450 nm and 720 nm, and takes advantage of varying attenuations in different wavelength bands. 2) Multi-scale Prior based image Dehazing Network (MPD-Net) uses multi-scale dark-channel and color attenuation priors on image triplets selected from a multi-spectral hazy image database. The proposed method is an encoder-decoder style CNN network that combines information from both SID-Net and MPD-Net by sharing a common decoder stage. The proposed network was trained on the SHIA dataset and evaluated at different fog density levels. Compared with popular prior and learning-based methods evaluated at SHIA dataset, the proposed method achieves superior performance both qualitatively and quantitatively.


I. INTRODUCTION
Image taken in the atmosphere are often subject to visible quality degradation like lower contrast, color distortion and blurring edges due to floating water particles in air. The floating air particles can be considered as haze or fog depending upon the size of water particles and their concentration, but it degrades the image in a similar manner [1], [2], and [3]. The amount of image degradation depends on the type and concentration of floating air particles, their wavelength and distance between the acquisition device and the object. As a The associate editor coordinating the review of this manuscript and approving it for publication was Felix Albu . result, the main features of the object are difficult to be identified especially the far objects in the presence of dense fog. Therefore, other high-level image-processing tasks either performed manually by any human operator or automatically by computer vision programs like object detection, classification and tracking etc. are affected in the presence of haze or fog. In order to reduce image degradation effects, the image dehazing task has become a crucial problem to be resolved. Therefore, the role of image dehazing algorithms is to restore the clear image from the original degraded input. The classifications of image dehazing methods can be made in different ways [4], [5]. First, it can be classified as methods based on physical model [1] using priors [13], [14], [15], [16], [17], [18], [19], [20], [21], [22] [23], [24], [25], [26] or methods using scene statistics without using physical model [6], [7]. Second approach is to classify the algorithms based on image enhancement [8], [9], image fusion [10], [11] and Deep-Learning based methods [27], [28], [29], [30], [31], [32], [33], [34], [35], [36], [37], [38]. Moreover, we can also classify dehazing algorithms depending on the input images as single or multiple [12]. However, single image based strategy has mostly been used in last decades. The single image dehazing methods provide good quality restoration results when used with physical model. But in the absence of modeled haze, these methods will fail to produce accurate results.
Earlier dehazing methods are mostly based on the atmospheric scattering model and have done a series of works. Many prior based methods have also been worked out like Dark Channel [13], Maximum contrast [14], Color Attenuation [15], Haze line [16] and bi-channel Priors [17]. Dark channel is one of the outstanding prior-based methods based on the assumption that image patches of outdoor haze-free images often have low-intensity values in at least one color channel. The dark channel prior will fail in scenarios with bright objects in the scene. The maximum contrast prior assumes that haze significantly reduces the contrast of image as the higher the haze concentration the lower will be the image contrast. The maximum contrast prior may result in color fidelity issues. The color attenuation prior assumes that hazy regions in the image are characterized with higher brightness and lower saturation where concentration of haze is positively correlated with difference between brightness and saturation. However, the prior-based methods may not work well in certain real cases as the assumption may be easily violated in practice and may lead to an inaccurate restoration of dehazed images.
Dehazing methods including both learning-based and model-based approaches are generally constrained to atmospheric scattering model as they can restore haze-free images based on a simplified model. These approaches do not take full benefits of powerful abilities of deep neural networks to learn complex models and fit advanced complex functions. Moreover, it may results in poor dehazed images due to inaccuracies in the estimation of atmospheric scattering parameters.
With the advancements in image sensors and spectralband filters, interest in multispectral and hyperspectral imaging systems has emerged during recent years where the range of application fields has also been widened considerably. Despite this, hyperspectral imaging databases for image dehazing are still lacking, especially in some controlled conditions. Recently, a hyper-spectral image database for dehazing called the SHIA (Spectral Hazy Image Database for assessment) [18] is available that contains indoor images captured in controlled haze or fog conditions. Using SHIA database, existing popular image dehazing methods have been compared for performance evaluation using images from different wavelength bands and fog density levels [18], [19], and [20]. Each of the compared image dehazing method performed better at some specific wavelength bands whereas it showed compromised results at other bands. Similarly, using fixed wavelength bands, the dehazing methods showed compromised performance with different fog density levels. Therefore, the existing image dehazing methods are sensitive to wavelength bands selection and fog density levels.
In this study, an effective spectral and prior based image dehazing and enhancement network (SPIDE-Net) is proposed, that shows better performance as compared to existing methods, using spectral hazy input images from variable wavelength bands and fog density levels. Using hazy and haze free image pairs from spectral hazy database, we have trained an end-to-end deep convolutional neural network. The SPIDE-Net utilizes spectral images of different wavelength bands and fog density levels and provides a dehazed and enhanced image. The proposed method utilizes image-toimage translation approach and does not rely on the conventional physical scattering model. Our main contributions can be summarized as follows: • An end-to-end image dehazing and enhancement network for visible-band spectral images is proposed. This network directly restores the haze-free image without the requirement of using atmospheric scattering model, estimation of atmospheric light and scene transmission map. The proposed network consists of two sub-networks: Spectral image dehazing network and multi-scale prior based dehazing network.
• For spectral image dehazing, we utilized two effective color haze related priors that are suitable for haze estimation and removal in different spectral bands. The first multi-scale prior based dehazing network (MPD-Net) utilized these priors as a guidance about hazy regions and haze density in a per-pixel manner.
• The spectral image dehazing network (SID-Net) employs multiple visible band images to learn spectral attenuation features from hazy images • To ensemble both networks, a common decoder stage for both networks is proposed by concatenating outputs from two encoder networks • For network training, we have used loss function to preserve contrast, luminance and high frequency information. The loss function comprises of pixel-wise and perceptual based loss functions. The weighted per-pixel and feature based loss functions employed different weights during network training.
• For faster convergence, we connect the hidden layers in the decoder to extra convolution and pooling layers to produce and match coarse output.
• Through extensive experimentation on spectral database and comparison with existing well-known dehazing methods, our proposed method gives favorable results. The rest of this paper is organized as follows: In Section II, we present a brief overview of the related work. VOLUME 10, 2022 The proposed method and its implementation details are given in Section III. Experimental results and comparisons are presented in Section IV. In Section V, we discusses the limitations of proposed method and describe the directions for future research. Finally, the conclusion of this paper is provided in Section VI.

II. RELATED WORK
In this section, we have given an overview of existing image dehazing methods classified here as single image dehazing and spectral image dehazing methods. In single image dehazing, first the atmospheric scattering model is discussed, then some state-of-the-art haze relevant prior based methods using atmospheric scattering model are discussed. An overview about well-known deep learning based methods is also given. The advantages and disadvantages of these methods are also mentioned. As some of the methods from deep learning, prior and image enhancement have also been used for spectral image dehazing, these methods are discussed in this section as spectral image dehazing. Our proposed network uses both haze relevant priors and deep learning based network for spectral image dehazing and discussed in section III.

A. SINGLE IMAGE DEHAZING
Most existing image dehazing methods are based on Koschmieder's law [1], [2], [3], which is a per-pixel dichromatic model and relate spectral radiance of the image with direct transmission and airlight terms. The haze effect is simply approximated by atmospheric scattering model [1], [2], [3] and it is formulated as: where J (x, y, c) is the haze-free color image, I (x, y, c) is the observed hazy color image, 'x' and 'y' are the horizontal and vertical locations in image and 'c' is the color channels, A is the global atmosphere light, t(x, y) is the medium transmission map and is defined as where β is the atmospheric scattering parameter and d(x, y) is the scene depth. Without knowing A and t(x, y), the image dehazing is an underdetermined problem. Equation (1) can also be formulated as It states that we can restore a clear haze-free image from a captured hazy image, if we can properly estimate the global atmospheric light and transmission map. The main drawback of using above law is the greater number of unknowns i.e. A and t(x, y) as compared to the number of equations and making the problem ill-posed. Another factor is the wavelength dependent attenuation coefficient β, affecting the color channels. Most of the dehazing methods often neglect the wavelength and β relationship when estimating the transmission map and airlight from a single hazy image.
Single image dehazing is an ill-posed problem for which different model and learning based methods have been proposed. The model-based methods involve the estimation of transmission map, atmospheric light and haze-free image whereas the statistical priors-based methods generally make some assumptions for the estimations to make image dehazing problem well-posed.

1) PRIOR BASED METHODS
He et al. [13] introduce dark channel prior to estimate the transmission map by assuming that the outdoor haze-free image patches have low-intensity values in at least one color channel. However, for natural scenes where the scene or object color is nearly similar to that of the atmospheric light or images having sky region in the scene, this prior will be less effective. Moreover, it requires additional filtering method to address the blocking effects and refine the transmission map. Tan [14] propose a maximum contrast prior based on the observation that the contrast in a haze-free image is higher than that of a hazy image and then restore image by maximizing the local contrast. However, for dense hazed image the color can be over-saturated by maximizing the local contrast. Zhu et al. [15] dehazing method is based on the color attenuation prior with haze concentration positively correlating by the difference between brightness and saturation. This method characterizes the hazy regions in the image with high brightness and low saturation values. However, in certain real-world circumstances where this prior does not hold, the color attenuation prior may result in color distortion and more background noise. Berman et al. [16] introduce a haze-line prior based on the observation that colors of a haze free image form tight clusters in RGB space that becomes lines in the presence of haze. Jiang et al. [17] propose an adaptive bi-channel prior combining the dark and bright channel priors and characterizes the white and black pixels in haze images based on HSV space. Fattal et al. [21] introduce that scene transmission and surface shading is locally uncorrelated and develops a refined image formation model. In [22], colorline prior is proposed based on the 1D distribution of pixels within small image patches in RGB color space and use it for recovering the scene transmission. Ju et al. [23] propose an improved atmospheric scattering model, by constructing a linear model between the transmission map and the haze aware density features and estimates the transmission map by a linear operation on luminance, saturation and gradient. However, this method results in color fading and over-bright image details [24]. Meng et al. [25] propose a regularization method exploring boundary constraint on scene transmission combining with contextual regularization. Ancuti et al. [26] propose a fast semi-inverse method using hue disparity between original image and its semi-inverse for detection and removal of haze from a single image. The priors and assumptions are very effective and show great success in helping haze removal but they are designed by observing some specific image properties that may not reflect true inherent properties of natural images, e.g., DCP [13] is less effective for natural scenes where the scene color is nearly similar to that of the atmospheric light. Similarly, for maximum contrast [14], color can be easily over-saturated for dense hazed image. Therefore, the prior-based methods may not perform well in certain real cases as the priors or assumptions may be easily violated in practice and thus lead to an inaccurate estimation of transmission map. After the availability of large real-world, outdoor and indoor, image datasets covering dense, homogeneous and non-homogeneous fog conditions, the prior-based methods become less effective as compared to deep learning based methods [27], [28], [29], [30], [31], [32], [33], [34], [35], [36], [37], [38] in terms of qualitative and quantitative comparisons.

2) DEEP LEARNING BASED METHODS
Keeping in view the apparent success of deep learning in image processing applications and the availability of large image datasets, in recent years, a large number of image dehazing methods based on deep learning have been proposed. These deep learning based methods have made significant improvement in image dehazing problem. Cai et al. [27] propose an end-to-end CNN called DehazeNet that incorporates artificial methods in each layer of network to learn haze relevant features. DehazeNet estimates the transmission map using trained network with paired synthetic hazy and hazefree images. However, this method learns the transmission map from synthetic data that is different from transmission map of natural hazy images. Moreover, DehazeNet can be unsuitable for some multisource images resulting in color and information loss as it regards atmospheric light as global constant. Ren et al. [28] propose a multi-scale deep neural network (MSCNN) that integrates coarse-scale and fine-scale networks for mapping between hazy images and their corresponding transmission maps. This method perform well by learning the transmission map but do not consider estimating the atmospheric light or directly learning the haze free image. Li et al. [29] propose an all-in-one dehazing network by estimating an intermediate variable that integrates both transmission map and atmospheric light. Ren et al. [30] propose an end-to-end trainable network and apply some preprocessing to generate multiple inputs, but using this approach may result in some color distortions in the final dehazed image. Pan et al. [31] introduce dual CNNs that consist of structure and detail to directly predict the transmission map and atmospheric light and calculate the haze-free image using inversing equation (3). Zhang et al. [32] propose an end-to-end network that can jointly learn the transmission map, atmospheric light and dehazing by directly embedding the atmospheric scattering model into the network. Chen et al. [33] propose a haze removal network based on the radial basis function to restore the hazy image while retaining the visible edges and brightness of restored images. This neural network contains flexible hidden neurons depending on scene complexity i.e., it utilizes more neurons for textual surfaces and lesser for plain surfaces. But determination for individual image patches for network configuration requires prolonged processing time [34]. Qu et al. [35] proposed a dehazing and enhancement network having two building blocks, a generative adversarial network followed by an enhancer of two enhancing blocks with the goal of reinforcing the dehazing effect in both color and details. Chen et al. [36] proposed an end-to-end gated context aggregation network to directly restore haze-free image and adopted the smoothed dilation technique. Qin et al. [37] proposed feature attention based network considering that different channel-wise features contain totally different weighted information and combines Channel Attention with Pixel Attention mechanism but for non-homogeneous hazy scenes this method lacks visually pleasing results. Recently, Jiang et al. [38] uses haze relevant features and derives attention maps from several handdesigned priors, such as dark channel, color attenuation, maximum contrast that could serve as a guidance for deep network using information of haze density and hazy regions at image pixels.
Deep learning methods try to directly estimate the transmission map or directly restore the final dehazed image as compared to the traditional and prior based methods. Deep learning methods achieve superior performance with robustness as they applies large datasets during the learning stage. However, higher processing and computational power is required for network learning as compared to traditional methods. Moreover, requirements of real-world hazy images with ground truth reference, makes deep learning methods not feasible in applications where large datasets are not available like infrared and hyper-spectral images i.e. due to unavailability of large scale spectral hazy image datasets with ground truth reference, fully end-to-end dehazing networks have not been designed. However, existing learning and prior based methods have been used for spectral image dehazing. Recently, some of the existing popular dehazing methods have been used for dehazing of spectral images and also evaluated using spectral hazy image datasets. An overview about image dehazing methods used for spectral image dehazing is given below.

B. SPECTRAL IMAGE DEHAZING
Existing spectral image dehazing algorithms can be divided into three categories i.e. image enhancement, physical model and learning based methods. Image enhancement based methods include histogram processing and multi-scale retinex. Contrast Limited Adaptive Histogram Equalization (CLAHE) is an image enhancement method that is also used for image dehazing [39]. CLAHE algorithm performs local contrast enhancement [40] using hyper-parameters of clip limit and number of tiles to limit the maximum localized contrast. Recently, machine learning based approach for hyper-parameter selection of CLAHE algorithm is used to avoid manual selection of hyper-parameters [41]. For spectral image dehazing, the CLAHE method is useful as it can enhance the local contrast while keeping the maximum VOLUME 10, 2022 contrast at an optimum level. However, the choice of clip limit and number of tiles for this method greatly effects the output image quality. Recently, Huang et. al. [42] proposed a dehazing algorithm combining histogram, phase consistency features and multiscale retinex to restore haze-free images. This algorithm is effective for outdoor urban remote sensing images. It has the advantage of lower complexity to meet real-time challenges but ignoring imaging theory will not ensure the same results in most of the situations i.e. the algorithm suitable for urban remote sensing will be less effective for other scenarios as it ignores atmospheric models.
The prior based methods using some hand-craft prior knowledge are also used for spectral image dehazing with the physical atmospheric model. Dark channel prior [13] is one of the outstanding prior-based dehazing method but proved to be less effective in scenarios having bright objects or sky region in the scene. Moreover, it requires some filtering method to address the blocking effects and refine the transmission map. The spectral images generally do not contain sky region as they are either captured at higher altitude to view the ground objects or at ground to inspect and analyze specific materials. Therefore, DCP based methods can be used for spectral image dehazing. The dark channel will result in lower values for most areas which will help in reducing blocking effects and simplifying the refinement of transmission map [43]. Recently, Zheng et. al. [44] propose to use failure point threshold with the DCP algorithm. This threshold will reduce the influence of bright objects in the image and avoids the inherit limitation of dark channel prior. Moreover, for dehazing of spectral aerial images, median filtering is used to refine the transmission map [45]. DCP method can be effective for spectral image dehazing but it may not satisfy all the real-world scenarios and will limits the practicality and applications.
For learning based spectral image dehazing, the nonavailability and higher cost of acquisition devices make it difficult to use and limit the research to certain test scenarios only. Therefore, due to unavailability of large-scale datasets, spectral guided approaches are not used by researchers for image dehazing. Most approaches reconstruct hyperspectral images from visible band color images. Sparse coding, manifold and dictionaries [46], [47] are used for reconstruction of hyperspectral images from visible band color images. Generative adversarial network based approach is used to estimate hyperspectral images from color images [48] and also used for dehazing of HSI images [49]. Recently, an unsupervised domain adaptation approach is used for reconstruction of hyperspectral images from color images [50]. However, unavailability of hyperspectral real-world image datasets with ground truth reference limits the learning of spectral responses, as mostly the hyperspectral images are of lower resolution. DehazeNet [27] is a popular CNN based network that produces transmission map and used atmospheric scattering model for dehazing. This method is very effective for visible hazy images. For spectral images, recently an end-to-end dehaze network is proposed with both local and global residual learning strategies [51]. The advantage of this approach is the faster convergence but the proposed network used modeled haze to generate spectral images for training. Although most of the created networks can restore high quality images but it depends mostly on the sample dataset used for the training of the networks i.e. modeled synthetic haze will not produce good results as compared to actual real-world hazy images with the ground-truth reference.
With the advancements in spectral-band filters and imaging sensors, spectral hazy image databases are now available. The real-world hazy spectral image datasets with ground truth reference will be more effective as compared to modeled haze or generation of spectral images from visible datasets. Recently, a Spectral Hazy Image Database for Assessment (SHIA) [18] is proposed that contains indoor images, captured in controlled haze or fog conditions. This dataset has been used for performance analysis and comparisons of different well-known image dehazing methods. In one analysis, fixed wavelength input images (550, 650, 750 and 850nm) were selected to evaluate performance of popular image dehazing methods [18]. Algorithms like DCP [13], MSCNN [28] and CLAHE [40] were compared and all the methods suffered from noise and structural artifacts. Moreover, average response of metrics like PSNR, were measured using ten different fog density levels. For images of two wavelengths (550nm, 750nm), CLAHE gives satisfactory results whereas for 650nm and 850nm wavelengths, DCP and MSCNN performs well respectively. Another analysis with this dataset showed DehazeNet [27] effectiveness at lower wavelengths whereas Berman et. al algorithm [16] performs better at higher wavelengths [19]. Recently, brute force optimizations is used to find the best wavelength bands triplet as input to different image dehazing methods [20]. Additional sRGB rendered images from SHIA dataset using full visible wavelength bands [52], are utilized for comparison of different methods. All the analyses and comparisons proved that existing image dehazing methods are sensitive to spectral bands selection. Moreover, these performance analyses [18], [19], and [20] give reasons for further work on hazy spectral images i.e. additional test cases for performance comparisons are required to analyze the variation of image dehazing methods with spectral bands selection. Moreover, an improved dehazing method is required that shows consistent results with spectral band variations.
In this study, we first propose spectral prior based image dehazing network that is trained on SHIA dataset with ground truth reference. Next, We define three test cases (longer, shorter and best spectral wavelength bands) for performance evaluation and comparisons of proposed method with well-known image dehazing methods. However, the best spectral wavelength band is computed for haze density level ''7'', for the consistency of the previously performed experiments [20]. The proposed method gives favorable results after evaluation and comparison with existing dehazing methods and shows consistent results i.e. PSNR value greater than 29dB for each of longer, shorter and best wavelength spectral bands where CLAHE, DehazeNet and DCP gives best PSNR of 27.515dB, 26.017dB and 14.901dB respectively. To evaluate the naturalness of output image, 'NIQE' reference-less quality metric is used. The proposed method gives NIQE lesser than 8.0 for each wavelength spectral bands, where the compared methods give NIQE higher than 11.0. The proposed method is proved to be superior than other methods after both qualitative and quantitative analysis. The architecture and detail information of proposed method is described in section III and the experimental results are shown in section IV.

III. PROPOSED METHOD
In this section, we introduce the network architecture design and basic functional modules of our proposed Spectral Prior-based Image Dehazing and Enhancement Network (SPIDE-Net). First, we present an overview of different functional modules of SPIDE-Net and describe salient features of each functional module. Later the ensemble scheme will be discussed that combines the output from both the MPD-Net and SID-Net networks. After that we discuss the loss functions that are utilized to train the networks. Finally, the implementation details are provided in this section.

A. SPIDE-NET ARCHITECTURE
SPIDE-Net is an encoder-decoder style network with two encoders and one decoder networks as shown in functional block diagram Fig.1. Both the encoders require different types and number of input images and therefore learn different coarse and fine scale features. All the learned features are combined at common decoder network. The input images for first encoder network, namely Spectral Image dehazing Network (SID-Net), are multiple spectral hazy images with nearly 40nm wavelength difference between consecutive wavelength images. The second encoder network namely Multi-scale Prior based dehazing network (MPD-Net) utilizes two existing image priors related to changes in color details of hazy image. The input images of MPD-Net are multi-scale prior-based images and hazy spectral images. The detailed network architecture design consisting of SID-net, MPD-net and multi-scale prior based attention module are explained in following section.

1) SID-NET: SPECTRAL IMAGE DEHAZING NETWORK
The first encoder network is Spectral Image Dehazing Network that requires multiple spectral hazy images from visible spectral bands between 450nm and 720nm. The variable attenuations in different spectral band images are utilized by this network to train and learn different haze related features. The detailed network architecture is shown in Fig.2. For the first input layer, the images are initially re-scaled to 256×256. After every convolutional layer in the network, the images are further re-scaled to half with layer depth increased to double. Each convolutional layer is accompanied with Branch normalization and ReLU non-linear activation function. The selection and number of spectral images are discussed in implementation details.

2) MPD-NET: MULTI-SCALE PRIOR BASED DEHAZING NETWORK
Inspired by the recent work of using multiple image priors as guidance for neural network, our proposed multi-scale prior based dehazing network uses multiple haze relevant priors as additional inputs to the networks for learning and finally predicting the dehazed output image. This encoder network is Multi-scale Prior based image Dehazing Network (MPD-Net) uses two haze related attenuation priors. It utilized Color attenuation and dark channel priors that are based on color distribution in images. These priors are used as a guidance about hazy regions and haze density at per-pixel level. Similar to SID-Net, for the input layers, the input images are initially re-scaled to 256 × 256 and then further reduced to half with layer depth increased to double, after every convolutional stage. Moreover, every convolutional layer is accompanied with Branch normalization and ReLU non-linear activation functions. The selection of Multiscale prior-based images and other spectral images from the datasets are discussed in Section IV under experimental results. SPIDE-Net decoder stage combines the information from both encoder networks and helps in joint learning of both the networks. Further details about the ensemble decoder are given in following section.

B. ENSEMBLE DECODER
As in encoder-decoder style networks [53], [54], [55], and [56] the purpose of encoder is to reduce the spatial dimensions in every layer and increase the number of channels whereas the decoder reduces the number of channels and increases the spatial dimension. The SPIDE-Net decoder is an ensemble decoder that combines the information from both encoder networks and helps in joint learning of both networks. Moreover, additional convolutional layers are added to the hidden layers of decoder to use low resolution ground truth images as reference and produce or match the coarse level outputs as shown in Fig.2. The benefits of this approach are the faster convergence and network stabilization during training process.

C. MULTI-SCALE PRIOR-BASED ATTENTION MODULE
The Multi-scale prior based attention module uses two well-known image priors for dehazing. The priors are dark channel and color attenuation priors that are sensitive to changes in color details of hazy image. The details of using each image prior in SPIDE-Net are described in following section.

1) MULTI-SCALE DARK CHANNEL FEATURE
Based on the observation that most local patches in haze free images contain some pixels having low intensities in at least one color channel, the dark channel prior is expressed as where (x) is a local patch centered at x location of the input image. The selection of patch size has an enormous effect on the dark channel features. If we select the patch size large, the dark channel will result in over-dehazing, whereas for smaller value tendency of dark channel will be towards zero. We employed multi-scale channel features to avoid over dehazing while maintaining the dark channel as well. Fig.3 depicts the multi-scale dark channel feature of input hazy image. For our multi-scale prior attention module, only multi-scale dark channels with patch sizes of 3 × 3 and 11 × 11 are used.

2) MULTI-SCALE COLOR ATTENUATION FEATURE
The color attenuation prior correlates the haze concentration with difference between brightness and saturation. The hazy regions in the image are characterized with high brightness and low saturation. This simple and powerful prior can be expressed as [22] h  where x is position within input hazy image, h denote concentration of haze, v is brightness component and s is saturation component. The linear coefficients for best results are θ 0 = 0.121779, θ 1 = 0.959710 and θ 2 = −0.780245 after training of 120 million scene points [15]. Fig.3 shows multi-scale features of color attenuation prior at 3×3, 7×7 and 11×11 scale. Similar to dark channel prior, for the color attenuation prior we use 3 × 3 and 11 × 11 multi-scale images. Fig.4 shows the multi-scale prior-based haze attention module including color attenuation and dark channel priors. Both the multi-scale outputs are combined with input hazy images for MPD-Net.

D. PIXEL AND PERCEPTUAL LOSS FUNCTION
We have used the loss function consisting of two loss types. First loss function is the L 1 distance function whereas the second loss function is the perceptual loss function i.e. MS-SSIM. Both the loss functions contribute to the total loss function of the entire network. The detail of total loss function including pixel-wise and perceptual loss functions are as under:

1) PIXEL-WISE LOSS
The L 1 loss function calculates per-pixel loss between the reference image and output image as the average Manhattan distance. In training tasks, L 1 loss function is normally used as it is beneficial in overall network convergence and constrains the image integrity in a non-local manner. The L 1 loss function can be represented as follows: where H represent the height, W represents the width and C represents the number of channels of the image. x, y, and c represent the index of the pixels in the image, F (x, y, c) and F(x, y, c) represents the corresponding pixel value in output and reference image.

2) PERCEPTUAL LOSS
The SSIM preserves the features based on luminance, contrast and structural similarity information for training the network and produce visually pleasing images. However, the selection of its parameters like scale i.e. 'σ ' greatly effects the performance. For smaller values of 'σ ', the local structures will not be preserved and can introduce artifacts in flat regions. Whereas, for larger values of 'σ ', the network preserves features and affects the quality of the processed images by preserving noise in the proximity of edges. Let x and y are the two image patches, extracted from the output and reference images, from the same spatial location, the SSIM [57] can be described as where µ x is the average of x, µ y is the average of y, σ 2 x is the variance of x, σ 2 y is the variance of y, σ xy is the covariance of x and y and C 1 and C 2 stabilizes the division with weak denominator. The SSIM can further be represented as multiples of luminance l and contrast similarity terms i.e. ''cs'' as following: SSIM (x, y) = l α (x, y).cs β (x, y) The MS-SSIM evaluation is obtained by combining the measurements of SSIM at different scales j = 1, 2, ..M as following: where MS(x, y) is the MS-SSIM at (x, y) location where the luminance comparison is computed only at scale M and is denoted as l α M (x, y) with α parameter. At the j-th scale, the contrast comparison c and structure comparison s are calculated and denoted as c(x, y) and s j (x, y) respectively. We have assigned equal weightage to all luminance, contrast and structure similarity terms by setting parameters α = 1 and β j = 1 for all scales j = 1, 2, ..M . The perceptual loss function for selected patch (X , Y ) is computed from MS-SSIM at center pixel (x, y), during network training to encourage similar activation of the reference image and output image and is described as:

3) TOTAL LOSS FUNCTION
The MS-SSIM can result in color shifting or brightness change and can make the image more dimmer but in the high frequency regions it can preserve the contrast. Whereas, the L 1 loss function can retain color and luminance information in image. To acquire best results from both the loss functions, we combine both the loss functions as a linear combination of l L 1 and l MS loss functions and define it as total loss function l total . The total loss function is described as: l MS where λ 1 and λ 2 are the weight parameters for each loss. A point-wise multiplication between G M σ and l L 1 is used, as MS-SSIM calculates the pixel-level error, using contribution of neighbouring pixels, to the MS-SSIM at central pixel where the contribution depends on the Gaussian weights [58]. At start of network training, λ 1 is set to lower values so that global details are assigned more weights. However, as the training progresses, we found it effective to increase the weightage of λ 1 to improve local details. However, Both the λ 1 and λ 2 weight parameters are suitable to update gradually i.e. after several training cycles. The λ 1 weight parameter is updated as following: where P d is the increment percentage value during training. We empirically initialize λ 1 as 0.16 and set P d as 1e-2. The λ 2 is decreased gradually using parameter λ 1 as following: The updating of weight parameters during network training are further discussed in implementation details section.

E. IMPLEMENTATION DETAILS
In proposed SPIDE-Net, the size of input layers is 256 × 256, therefore, all the spectral images are first resized to 256 × 256. For training of proposed SPIDE-net, multiple wavelength band images from SHIA visible dataset VOLUME 10, 2022 are used as input with the corresponding sRGB rendered images [52] as output. For faster network convergence and improved results, we have used scale down version of sRGB rendered images as low-resolution references for the network. The inputs for SID-Net are visible band spectral images, whereas the MPD-Net requires both prior-based and visible band spectral images. First, for SID-Net we select 06 images from SHIA dataset from 450nm to 720nm wavelengths, with constraints of 40nm wavelength difference between two consecutive spectral images. However, this network requires six spectral images at different wavelengths for training stage only. After network training and during performance evaluation image triplets are used. Therefore, for remaining 03 images required for SID-Net, we have used interpolation scheme. Next, for MPD-net we have used a total of 07 images, where 03 images are selected from input images of SID-Net and additional 04 images are generated from multi-scale prior based attention module. Using the constraints of 40nm wavelength difference between two consecutive spectral images, we have generated a total of 1200 pairs of spectral and sRGB rendered frames for network training, by using data augmentation techniques like scale down to 256 × 256 and then rotation by 90 o , 180 o ,270 o , flipping horizontal and vertical. We also used scaling down the input images to 512×512 and then cropping randomly 256×256 images from it. We have utilized Keras framework for network training and used online Google Colaboratory resources while allocating nearly 12GB RAM and 8GB GPU memory. During network training, fixed batch size of 4 images are used with ADAM optimizer. Training has been done for 300 epochs in order to ensure convergence where after every twenty learning cycles or EPOCHs, the weight parameters for L1 and MS-SSIM i.e. λ 1 and λ 2 are updated. The increment and decrement percentages of λ 1 and λ 2 are 1% only. The learning rate is initialized with 1e-4 in the start of training. During training, the learning rate toggles between 1e-4 and 1e-6 using cosine annealing where one complete learning rate cycle takes 1600 steps. For integrated dehazing and enhancement, additional training for 100 epochs is done with contrast enhanced sRGB rendered ground-truth images as reference. The experimental results are discussed in section IV.

IV. EXPERIMENTAL RESULTS AND DISCUSSION
In this section, we present evaluation scheme of proposed SPIDE-Net with spectral hazy image dataset (SHIA) [18]. This dataset is recently used for evaluation and performance analysis of different dehazing algorithms [19], [20]. However, the existing evaluation schemes either used fixed wavelength bands with variable fog density levels or find best wavelength bands with fixed fog density level. To evaluate the performance of proposed SPIDE-Net with existing popular methods, we have extended the comparison approach by changing both the wavelength bands and fog density levels.
Our approach for comparison is based on using shorter and longer wavelength span image triplets from SHIA dataset. We have also utilized the existing best wavelength band with fixed fog density level approach to compare the results with previous experiments. The longer wavelength span images from SHIA dataset contain more useful information as compared to shorter wavelength span images. Therefore, the shorter wavelength span images are more challenging especially for dense haze images. We have compared our proposed method with existing well-known methods for both qualitative and quantitative analysis. For comparison, the dehazing methods are selected based on different classification types of image dehazing. The selected methods for comparisons contain two popular prior-based methods i.e. DCP [13] and CAP [15]. It also contains learning based method as DehazeNet [27], image enhancement based method as CLAHE [40] and image fusion based method as Pyramid Fusion [59]. Furthermore, additional studies are conducted to demonstrate the effectiveness of our method for contrast enhancement of spectral images where the existing methods resulted in low contrast images.

A. DATASET
Image databases covering different scene contents and capture conditions are an essential requirement to evaluate the dehazing algorithms. Recently, image acquisition of multi-spectral and hyper-spectral systems for variety of applications is available due to technological advancements in imaging sensors and spectral filtering. Despite the potential in spectral filtering, still these have not been used in dehazing methods. Presently, a database for dehazing called the SHIA (Spectral Hazy Image Database for assessment) [18] is available containing spectral hazy and haze-free image pairs that are captured in controlled conditions. This dataset is inspired by a previous color image database (CHIC) [60]. The SHIA dataset consists of two real indoor scenes, M 1 and M 2 , each captured with their corresponding fog-free or ground truth image pairs. The spectral images are captured for ten different fog levels generated by fog machine. The information about particles size and concentration is not available with the dataset therefore, we have used both haze and fog terms. The images with fog level 1 correspond to maximum fog level, whereas the level 10 corresponds to the minimum level. Images are captured in both visible (450nm to 720nm) and near infrared wavelengths (720nm to 1000nm) each with spectral resolution of 10 nm. The integration time for visible and near-infrared images is 530ms and the image sizes are 1312 × 1082 pixels. The performance evaluation scheme of existing methods with proposed SPIDE-Net using the SHIA dataset are explained in next section.

B. PERFORMANCE EVALUATION SCHEMES
In this study, we limit our work to visible band hazy images only and leave near-infrared hazy images for future work. First, we assign labels to spectral hazy images based on different fog levels. As there is almost no information in the spectral bands of fog level 01 to 05, these fog levels are treated as very dense fog and not used in our training and evaluation. We have selected five fog levels (i.e. level 6 to 10) and labeled them as ''low'' for level 10, ''medium'' for level 9, ''high'' for level 8, ''very high'' for level 7 and ''dense'' for level ''6''. The labeled haze levels along with the ground truth image are shown in Fig.5. After labeling spectral hazy images from different haze intensity levels, we have selected two spectral image triplets from each haze level for comparison of the proposed method with existing well-known methods. Recently, the analysis of different dehazing algorithms with the best image triplet selected through brute force optimization [19], [20] have made it clear that the existing dehazing methods perform well at some specific spectral wavelengths i.e. some algorithms perform well at shorter wavelength ranges and others at longer wavelength ranges of visible band. Therefore, to evaluate the performance variation of dehazing methods with different spectral wavelength bands, we have made three test cases based on spectral wavelength ranges consisting of image triplets from each labeled haze level. The image triplets are used as inputs for our proposed method and other well-known dehazing methods selected for comparison. The wavelength ranges are termed as shorter wavelength span i.e. 530nm, 490nm and 450nm (where the difference between selected bands are 40nm only) and longer wavelength span as 710nm, 530nm and 450nm (where the difference from first band is 80nm and 180nm respectively) and full wavelength span for pseudo color sRGB (using all the spectral wavelength images between 450nm and 720nm). The shorter and longer wavelength span are selected for performance evaluation as the spectral wavelength has a great impact on scene information i.e. the higher wavelength has more information as compared to the lower wavelength as shown in Fig.6.

C. IMPLEMENTATION RESULTS
The proposed SPIDE-Net consists of two encoder networks i.e. SID-net using spectral hazy images and MPD-net using spectral and multi-scale haze relevant priors. Both the encoder networks share common decoder stage; therefore, we have presented the final experimental results of SPIDE-net that includes the results from both SID-net and MPD-net networks. After experiments on SHIA dataset, we compare the performance of our proposed method with other well-known methods. For analysis with this dataset, we have used three test cases as mentioned in performance   Fig.7 and Fig.8 respectively. The top row in each haze level from ''low'' to ''dense'', corresponds to the actual dehazed image and bottom row shows the contrast stretched images for better visualization and comparison purposes as the inherent contrast of image triplets are at lower side. It is evident from Fig.7 that the proposed SPIDE-Net provides better image quality for all haze levels, whereas the other methods are affected with denser haze levels. Some methods show vignetting effect at the corners especially the pyramid fusion [59], whereas the proposed SPIDE-net has not shown this behavior. The CLAHE method [40] being the localized contrast enhancement method performs better at lower haze density levels but it's performance drops as haze density level increases especially for case-II. Similarly, Fig.8 shows the performance comparison of SPIDE-Net on gray-scale images using shorter wavelength span (case-II) and is evaluated qualitatively for different haze density levels. The same trend can be observed here, where the input images are more challenging because of lesser scene information and stronger attenuation. The performance of CLAHE [40] degrades as shown in Fig.8 for higher haze levels.
The next approach uses color hazy and ground truth images, generated using sRGB rendering with all available visible band wavelengths from 450nm to 720nm, labelled as case III. The proposed SPIDE-Net demonstrates good results when compared with existing well-known methods as shown in Fig.9 where the top row for each haze level from ''low'' to ''dense'', corresponds to the actual dehazed image and bottom row shows the contrast stretched images for better visualization.
The availability of spectral fog-free images as reference or ground truth images in SHIA dataset [18] allows it to be used for performance evaluation with different full-reference and reference-less image quality metrics. We have used both quality metrics in our quantitative analysis and selected PSNR and NIQE image quality analysis metrics for comparison of our algorithm with existing well-known methods. PSNR is full reference image quality assessment metric requiring reference image triplets from SHIA dataset for comparison and analysis with higher value represents better quality. The NIQE is a blind and reference-less Natural Image Quality Evaluator metric representing lower values for better visual perception. Table 1 summarizes the quantitative results of our proposed method with existing well-known methods. For best spectral triplet, the experimental setup [20] is used to compare the results. The computations for PSNR and NIQE are made using input images from fog density 7, i.e., very high haze in our setup. It can be observed that our SPIDE-Net architecture demonstrates the best performance with consistent PSNR >29dB for longer, shorter and best wavelength span images. The CLAHE algorithm shows good results with 26.1106dB and 27.515dB for longer and best wavelength spans whereas DehazeNet shows good results with 26.0170dB for shorter wavelength spans. The NIQE metric for proposed SPIDE-Net is <8.0 representing better image perceptual quality as compared to NIQE >11 for remaining methods. However, the DehazeNet shows good results after SPIDE-Net with 11.5346, where CLAHE results degrades to 16.6446 due to over stretching and increased noise. From the results and visual comparison, it is evident that the proposed method shows better performance when compared to reference well-known image dehazing methods.

V. DISCUSSION AND FUTURE WORK
Spectral images contain more information as compared to RGB images due to availability of multi-wavelength band images. But for image dehazing application, the higher volume of spectral hazy images will make it difficult to use all the visible wavelength band images for dehazing. Therefore, it requires some selection criteria to pick specific images to be used for dehazing. One option is to use some fixed wavelength bands triplet as input for popular image dehazing methods and utilize the best method for dehazing purposes [19], second approach is to find the best wavelength band triplet from full spectral band and use it for dehazing [20]. Another approach is to use sRGB rendering [20] with full visible band wavelength images and utilize it for dehazing purposes. However, practically it will be difficult to use some fixed image triplet or find the best one for existing methods, since for most real-world scenarios the atmospheric attenuations with wavelengths are not constant. Therefore, best wavelength image triplet for one fog density level will not be effective for other fog density level. The proposed SPIDE-Net reduces the wavelength dependencies as it has been trained with wavelength band images having variable fog density levels. Moreover, it uses sRGB rendered reference images using full visible wavelengths for supervised learning of network. The qualitative comparisons have been made with popular existing methods for fixed image triplets (both longer and shorter) and sRGB rendered images for full wavelength span. Moreover, quantitative comparisons have also been made using longer, shorter and best wavelength triplet. The proposed method showed good performance in all the comparisons and is more robust with wavelength bands variations, whereas existing methods are sensitive to wavelength bands selection. For very dense fog, the hazy images in SHIA database are very challenging having no substantial information. For these images, the SPIDE-Net as well as other methods will not give satisfactory results. As future research, we plan to include near infrared (NIR) spectral images from SHIA dataset for network training, since NIR images are less effected in very dense haze due to higher wavelength. Moreover, this option will also result in extending the wavelength combinations for best band selection. Our suggestion is to acquire color RGB images as ground truth reference from the same experimental setup where hyperspectral hazy and haze free images are acquired. The reference RGB color images will be more feasible for training purposes as compared to sRGB rendered image. It will also be more helpful for algorithm's comparison and performance evaluation tasks.

VI. CONCLUSION
In this paper, we have developed an effective Spectral and Prior based Image Dehazing Network (SPIDE-Net) that can restore and enhance visible band spectral hazy images.The proposed network uses multiple spectral hazy images having different attenuations in different wavelength bands depending on the size of atmospheric particles. The SPIDE-Net also utilizes image priors like Dark channel and Color attenuation as additional information and guidance to achieve better results. We have used common decoder for two encoder networks to achieve the purpose of ensembling and joint learning.The proposed SPIDE-Net is trained on hazy spectral images from SHIA dataset and evaluated on different low and dense haze density levels. When compared with popular image dehazing methods,the proposed method achieves superior performance both qualitatively and quantitatively. As a future work, the SPIDE-Net can be extended to utilize both visible band and near-infrared wavelength images, for haze removal in existing hazy image datasets.
GULISTAN RAJA (Senior Member, IEEE) received the B.Sc. degree in electrical engineering from the University of Engineering and Technology, Taxila (UET Taxila), in 1996, and the master's degree in information systems engineering from Osaka University, Osaka, Japan, in March 2002. He worked as a Research Associate at the Department of Electrical Engineering, UET Taxila, from June 1997 to March 1999. In April 1999, upon availing scholarship by the Government of Japan, he completed the master's degree. During that, he also worked with Synthesis Corporation, Osaka. He is currently working as a Professor of electrical engineering at the UET Taxila. He has authored/coauthored more than 100 research publications in international journals, and refereed conferences. His research interests include signal and image processing, and VLSI architecture of image processing systems. He has been a member of technical program committee in many IEEE international conferences and a reviewer in international journals.