NUAM-Net: A Novel Underwater Image Enhancement Attention Mechanism Network

: Vision-based underwater exploration is crucial for marine research. However, the degradation of underwater images due to light attenuation and scattering poses a significant challenge. This results in the poor visual quality of underwater images and impedes the development of vision-based underwater exploration systems. Recent popular learning-based Underwater Image Enhancement (UIE) methods address this challenge by training enhancement networks with annotated image pairs, where the label image is manually selected from the reference images of existing UIE methods since the groundtruth of underwater images do not exist. Nevertheless, these methods encounter uncertainty issues stemming from ambiguous multiple-candidate references. Moreover, they often suffer from local perception and color perception limitations, which hinder the effective mitigation of wide-range underwater degradation. This paper proposes a novel NUAM-Net (Novel Underwater Image Enhancement Attention Mechanism Network) that addresses these limitations. NUAM-Net leverages a probabilistic training framework, measuring enhancement uncertainty to learn the UIE mapping from a set of ambiguous reference images. By extracting features from both the RGB and LAB color spaces, our method fully exploits the fine-grained color degradation clues of underwater images. Additionally, we enhance underwater feature extraction by incorporating a novel Adaptive Underwater Image Enhancement Module (AUEM) that incorporates both local and long-range receptive fields. Experimental results on the well-known UIEBD benchmark demonstrate that our method significantly outperforms popular UIE methods in terms of PSNR while maintaining a favorable Mean Opinion Score. The ablation study also validates the effectiveness of our proposed method.


Introduction
Underwater visual quality degrades due to wavelength-dependent light scattering and absorption under the water, resulting in low-visibility, low-contrast, and color-cast issues in underwater images [1].This limits the accuracy of vision-based underwater systems and tasks, e.g., underwater tracking [2,3], robot navigation [4,5], and ecological monitoring [6,7].Researching advanced Underwater Image Enhancement (UIE) techniques [8][9][10][11], which improve the visual quality of degraded underwater images and benefit vision-based underwater systems, is of great significance for the development of marine engineering.
Recently, deep learning-based image enhancement methods have made significant advancements by training models with well-collected image pairs to learn the mapping from the low quality images to the reference images.However, it is impractical for underwater image enhancement tasks to obtain groundtruth clear images since the irreversibility of underwater imaging progresses with highly complex degradation.To address this challenge, popular methods [10,[12][13][14] propose generating reference images to approximate groundtruth images to train UIE models.For instance, Ref. [10] utilizes 12 state-of-the-art UIE algorithms to generate a set of enhanced images, manually selecting the best image as the reference image.Leveraging these pairs of underwater images and their well-enhanced reference images, deep learning-based UIE approaches have achieved impressive performance in improving the visual quality of underwater images [11].Nevertheless, the reference image can not perfectly approximate the groundtruth and is susceptible to various influences, including subjective human preferences during the selection process and variations in algorithm parameters.These lead to insufficient UIE learning for the uncertainty issue of the ambiguous label, i.e., multiple potential solutions exist for the same degraded underwater image.As shown in Figure 1, using a single reference image as the label to train a UIE model is sometimes insufficient since lacking the true clear image and multiple candidate references can lead to ambiguity in selecting the best one.To address the uncertainty issue, we follow PUIE-Net [8] to tackle the uncertainty problem as the probabilistic sampling approximation problem.Let x and y denote the degraded underwater image observation and the clear enhanced image, respectively.Considering that z represents the uncertainty arising from different people choosing reference images generated by different algorithms as training labels for deep learning networks, UIE aims to model the clear image distribution from x with uncertain reference z, i.e., p(y | z, x).For a given x, we can assume that z follows a distribution p(z | x) because the uncertainty of z is generated by the process of x.Once the sampling size S is large enough, z approximately follows a normal distribution and the UIE model can be approximated as [8] Motivated by this theoretical method, which is different from most existing methods, PUIE-Net proposed a probabilistic training framework that randomly samples one of the multiple candidate references instead of the selected "best" reference for training the UIE model, avoiding the uncertainty issue.
Although PUIE-Net achieves encouraging enhancement results, it does not perform well in challenging scenarios.We argue that there are two perception limitations to PUIE-Net: (1) local perception limitation-PUIE-Net adopts the U-shaped SE-ResNet50 architecture with a limited local receptive field as the feature extractor, which makes it hard to model the long-range dependencies as well as global perception for dealing with large-scale underwater degradation; (2) color perception limitation-most existing methods, as well as PUIE-Net, exclusively extract features from the RGB color space, which is not always enough to capture fine-grained underwater color degradation clues.
In this paper, we propose a novel NUAM-Net to address these limitations.Firstly, we proposed a novel Adaptive Underwater Image Enhancement Module (AUEM) that leverages three parallel mechanisms-Large-Kernel Attention (LKA), Simple Gate (SG), and Channel Attention (CA)-to model the long-range spatial and channel interaction with both local and long-range receptive fields to avoid the local perception limitation.Secondly, we enrich the color perception by extracting features from both RGB space and a wider and more accurate color-represented LAB space, to highlight fine-grained underwater color degradation clues and address the color perception limitation.Built on the probabilistic training framework, our NUAM-Net achieves significant PSNR improvements on the popular UIEBD benchmark compared to state-of-the-art UIE methods.
In conclusion, our contributions are summarised as follows: − Based on a probabilistic training framework, we propose a novel NUAM-Net that extracts features from both RGB and LAB color spaces and that models long-range spatial-channel interaction with both local and long-range receptive fields, avoiding the uncertainty issue in UIE learning as well as the local and color perception limitations introduced by PUIE-Net; − We conduct comprehensive experiments on the well-known UIEBD benchmark, and the highly competitive PSNR and SSIM results against state-of-the-art UIE methods demonstrate the effectiveness of our method.The ablation study also illustrates the gains of the proposed components.

Related Work
In this section, we briefly introduce the previous works regarding model-free UIE methods, prior-based UIE methods, learning-based methods, and the attention mechanism.

Model-Free UIE Methods
Model-free techniques typically refine underwater images by directly adjusting pixel luminance without relying on specific physical models, such as using Contrast-Limited Adaptive Histogram Equalization (CLAHE) [15], White Balancing (WB) [16], and Retinex [17].Ref. [18] introduced a fusion-based Underwater Image Enhancement (UIE) method where the inputs and weights are determined solely from the degraded images.Ref. [19] improved upon this method by incorporating white balancing techniques and an innovative multiscale fusion strategy to achieve better enhancement results.Fu and colleagues [20] proposed a Retinex-based UIE method designed for enhancing individual underwater images.Gao and associates [21] developed an underwater image enhancement technique inspired by the functionality of fish retinas, aiming to address issues such as color bias, unevenness, and content blur in images.While these model-free techniques are efficient and straightforward to implement, their disregard for the complex mechanisms of underwater imaging can sometimes lead to unstable outcomes and fail to achieve the desired image enhancement effects.

Prior-Based UIE Methods
Prior-based methods focus on estimating the parameters of underwater imaging models through prior hypotheses, and then use these physical models to enhance the quality of underwater images.Chiang and colleagues [22] proposed a method that utilizes dehazing technology to enhance underwater images.Galdran et al. [23] adapted the Dark Channel Prior (DCP) [24] by using information from the red channel to infer the depth map of underwater images.Li et al. [25] introduced a dehazing method tailored to the characteristics of underwater environments and proposed a contrast enhancement technique based on the principles of the minimum information loss and the prior knowledge of a histogram distribution.Berman et al. [26] considered the spectral profiles of different water types and additionally estimated two global parameters: the attenuation ratios between the blue-red and blue-green channels.Akkaynak et al. [27] developed the Seathru method, which is based on an improved physical imaging model and uses RGBD images as input to estimate scattering from the darkest pixel and its known depth map, and then estimates the attenuation coefficient of varying illumination across the scene.While these methods are effective in specific contexts, they may not be sufficiently robust in handling more complex scenarios due to the challenge faced by parameterized physical models in perfectly capturing the complexity and diversity of underwater environments.

Deep Learning Methods
Learning-based UIE methods stand out from model-free and prior-based approaches by leveraging the powerful feature extraction capabilities of deep neural networks and nonlinear mapping functions, driven by data, to enhance underwater images.Li et al. [28] were the first to unsupervisedly use generative adversarial networks to create synthetic underwater images, which were then employed to train an enhancement network.Li et al. [29] proposed a method that requires only weak supervision, reducing the need for paired data.Guo et al. [30] employed a multi-scale dense generative adversarial network for underwater image enhancement.Li et al. [31] developed a lightweight UIE model that incorporates underwater scene priors.Li et al. [10] curated a comprehensive real-world UIE dataset, UIEB, with reference images manually selected from several existing UIE methods, and proposed a gated fusion network for image enhancement based on this dataset.Jamadandi et al. [32] suggested enhancing underwater images using networks combined with wavelet transform corrections.Addressing the diverse degradation characteristics of underwater images, Uplavikar et al. [33] trained a deep neural network to extract domain-invariant features from given images, with the domain defined by the Jerlov water type.Li et al. [11] introduced Ucolor, a UIE network based on medium transmission-guided multi-color space embedding.Kar et al. [34] proposed a zero-shot restoration method for underwater and dehazed images, leveraging theoretically derived degradation properties.However, many current learning-based methods [35][36][37][38][39][40][41][42] rely on end-to-end training with annotated image pairs, leading to uncertainties due to the ambiguity of multiple potential reference images.To address this issue, PUIE-Net [8] approached the uncertainty problem as a probabilistic sampling approximation and introduced a probabilistic training framework for UIE.Our research builds upon this probabilistic training framework and introduces NUAM-Net, which models local and long-range dependencies with enhanced color perception to capture detailed underwater color degradation cues, overcoming the limitations of local perception and color understanding in PUIE-Net.

Attention Mechanism
In deep learning, attention mechanisms have become a key technique [43,44] and are acclaimed for their ability to enhance a model's focus on key elements within input data [45][46][47].Channel Attention (CA) meticulously examines the dynamics of cross-channel feature activation [48], highlighting the relational importance of different features, while spatial attention evaluates the significance of the layout of information space [49], optimizing the model's perceptual field.Large-Kernel Attention (LKA) [50], by combining Depthwise Convolution, Depthwise Dilation Convolution, and Pointwise Convolution, effectively captures long-distance relationships within features, improving adaptability.In this paper, we propose a novel adaptive underwater enhancement module that takes advantage of the local and long-range receptive fields of CA and LKA to model the longrange spatial and channel interaction; it also leverages an extra Simple Gate (SG) to fully explore the complementary information between the CA and LKA.This module shows significant gains in our ablation study.

Method
In this section, we elaborate our method.

Probabilistic Training Framework and Multi-Label Training
To avoid the uncertainty issue, we have adopted the probabilistic training framework [8] to perform a multi-label training strategy for UIE learning.In multi-label training, the dataset we use contains four different labels, as shown in Figure 1.During the training phase, each time an image is input, we randomly select one of the four labels as the input label for training.The selection method is as follows: In the formula, l is the label that serves as input during the network training process.label i is one of the four labels that we randomly select, where 0 represents the label in the UIEB dataset, 1 represents the label obtained through contrast adjustment, 2 is the label obtained through saturation adjustment, and 3 is the label obtained through gamma correction.

Network Architecture
Figure 2 illustrates the architecture of the NUAM-Net network.The network architecture consists of two branches, each including a feature extractor based on U-Net.Specifically, the upper branch aims to extract segmentation features from a single original underwater image, while the lower branch aims to construct UIE segmentation features using the input underwater image and its multiple labels.In the upper branch, we concatenate the original image and its conversion to the LAB color space along the channel dimension as input.In NUAM-Net, to enhance the parameters and extraction capabilities of the feature extractor, we replace the convolutional extractor with SE-ResNet50 (as shown in Table 1).Due to the lack of certain prior knowledge during the feature extraction process, we introduce the LAB color space of images to integrate prior information of the image.The LAB color space can better separate the color information and brightness information of the image, which is beneficial for the reconstruction of underwater images.
f is the feature extracted by the feature extractor, F extractor is the operation in the Feature Extractor, input and input LAB are the RGB picture and LAB picture, and ⊕ operation is a concatenation operation performed on the channel dimensions of multiple features, merging them into a single feature.The core part of this network lies in the feature enhancement transfer module after feature extraction.To obtain stronger features, we designed a probability enhancement module called AUEM, which takes the features extracted by the feature extractor as input.The output features are the result of concatenating the enhanced features with the original features along the channel dimension.
f in is the input feature, f enhance is the feature enhanced by AUEM module, F AUEM is the enhancement operation, ReLU is the activation function, and Conv is the convolution operation.Next, we need to construct image enhancement style features based on the features extracted from a large sample.In an image, the style can be described by the mean and variance of the extracted features across each channel.This is mainly because they reflect the statistical characteristics of the color distribution and brightness distribution of the image, which, to a large extent, determine the appearance and feel of the image, such as whether it is bright, colorful, high contrast, etc.These statistical features provide important clues for image processing and analysis.During the training phase, we calculate the variance and standard deviation for the target image and the original image across each channel dimension.
where σ l (c) and µ l (c) are the variance and mean of the label features across each channel, σ in (c) and µ in (c) are the variance and mean of the image to be processed across each channel, H is the height of the image, W is the width (in pixels), C denotes the number of channels, and f l and f in are the feature vectors after extraction and enhancement, respectively.After obtaining these features of mean and variance, we randomly sample from these features to construct the normal distribution functions for these means and variances.We perform the random sampling operation through convolutions, and then construct the functions based on the sampling results.
where V l and M l represent the normal distribution functions for the variance and mean of the label features, respectively, and V in and M in represent the normal distribution functions for the variance and mean of the input features, respectively.The distributions of the mean and variance of the target image are used as the style parameters for image style transfer in the PAdaIN module (detailed in Section 3.3), and we specifically characterize this term in the loss function by using the KL divergence to describe the difference between the two distributions (elaborated in Section 3.5 in the description of KL divergence in the loss function).The purpose is to complete the transformation of the image style during the feature extraction process.

PAdaIN
In this paper, we treat the underwater enhancement problem as the domain styletransfer problem, and we therefore adopt Adaptive Instance Normalization with posterior distribution (PAdaI N) [8]: Here, x represents the features of the content image, µ and σ denote the mean and standard deviation operations, respectively.b and a are two random samples drawn from the posterior distribution of the mean and standard deviation.Specifically, the posterior distribution can be learned through CVAEs [51].A conditional variational autoencoder (CVAE), which combines raw data and their corresponding categories as inputs to the encoder, can be used to generate the data for specified categories.
N m and N s represent the Gaussian distributions of mean and standard.The variables a and b are drawn randomly from the distributions of mean and standard deviation, respectively.µ(x) and σ(x) represent the mean and standard deviation of the mean of the input image.m(x) and v(x) represent the mean and standard deviation of the standard deviation of the input image.

Adaptive Underwater Image Enhancement Module
AUEM consists of two parts; the architecture is shown in Figure 3. Firstly, the features of LAB color space images are concatenated with those of the original images.Subsequently, they undergo a convolution to adjust the feature dimensions, followed by the AIEM (Adaptive Illumination Enhancement Module) [52], which consists of two components: Hierarchical Information Extraction (HIE) and IMAconv.4c, f ∈ R C×H×W passes through a channel attention module to obtain f 1 ∈ R 1×H×W .Then, it passes through a 1 × 1 convolution, followed by a ReLU activation function, and, subsequently, through another 1 × 1 convolution with a Sigmoid activation function to produce feature f 2 ∈ R 1×H×W .Finally, a wise multiplication operation is performed between f 2 and the original feature f to obtain the final feature.The research motivation of IMAConv is to integrate information from different feature spaces and channels.As shown in Figure 5, features are divided into S branches (dividing the original feature into S parts along the channel dimension), each consisting of three concatenated convolutions.Conv − 3 is the dynamic convolution block, x i is the divided feature, and xi is the original feature without x i .C n (.) is the mapping function to combine each feature.The formulas of C n are as follows: Conv − 3 employs the concept of dynamic convolution to assign weights to these three convolutional kernels.Dynamic convolution is the dynamic aggregation of multiple parallel convolution cores based on attention.Attention dynamically adjusts the weight of each convolution kernel based on the input, resulting in an adaptive dynamic convolution.After passing through AIEM, the enhanced features are concatenated with the features before inputting to the AIEM module.The output can be represented as input is the feature extracted by the extractor, Convolution is the convolution operation to adjust the feature dimension, (i.e., the convolution operation to fuse the old feature to obtain the new dimension features), F AIEM is the AIEM module operation, and ⊕ is the concatenation.

Loss Function
In the supervising stage, we utilize the Mean Squared Error (MSE) as the loss function to quantify the discrepancies between the original and the output images, defined as x label (x, y, c) − x pro (x, y, c) 2 (18) where x label represents the random label picture and x pro is the network's output.
Additionally, to enhance the human perceptual quality of the processed images, we integrate a perceptual loss function, utilizing a pre-trained Vgg16 network as the perceptual evaluator [23]: with N being the batch size and F vgg16 being the Vgg16 network equipped with pre-trained weights.
In addition to minimizing the enhancement loss, Kullback-Leibler (KL) divergences are utilized to assimilate the posterior distributions and the prior distributions.This process involves measuring the discrepancy between the posterior and prior distributions, ensuring that the enhanced image aligns well with both the desired enhancement characteristics and the prior knowledge captured by the distributions.By minimizing the KL divergences, the network learns to generate enhanced images that not only match the desired visual attributes but also adhere to the underlying statistical properties encoded in the prior distributions.
D KL refers to the KL divergence between two distributions.|| is used to denote the Kullback-Leibler divergence (KL divergence) between two probability distributions.
Finally, to align the processed images closely with their labels, we combined three parts as our model loss function.
This formulation aims not only to minimize the direct errors between images but also to improve their realism, their visual appeal to the human eye, and the distribution of the features, while preserving image detail and quality.β represents the weight; in our model, we choose β = 0.1.

Training Configuration
We utilized an extended multi-label dataset UIEBD (Underwater Image Enhancement Benchmark Dataset, which is a multi-label underwater image enhancement dataset) [8].Some training datasets pictures are shown in Figure 6.A challenge encountered prior to training the probability network was that existing UIE datasets typically provide a single reference map for each degraded underwater image.To facilitate the application of the probability network, we augmented existing UIE datasets by generating multiple reference images.The new dataset we adopted is based on UIEBD [10], a real-world UIE dataset comprising 890 underwater images along with corresponding reference maps.
In the original UIEB (Underwater Image Enhancement Benchmark, a dataset that was proposed in 2020) [10], the authors employed 12 state-of-the-art enhancement algorithms to generate potential groundtruth.Volunteers were then asked to subjectively select the best image among pairwise comparisons of the original underwater image and the 12 enhanced images, with the chosen image serving as the final reference.Ambiguity was addressed in UIEBD through contrast and saturation adjustments as well as gamma correction, given that distortions in underwater images primarily manifest in aspects such as contrast, saturation, brightness, and color.
It is important to note that our aim was to generate ambiguous labels rather than to significantly alter the original labels.Contrast and saturation adjustments were performed using a simple linear transformation formula, where y = α(x − m) + x, with x and y representing the input and output, respectively, and m denoting the mean value for each channel.α stands for the adjustment coefficient, which remains consistent for all pixels in contrast adjustment and is determined by each pixel itself in saturation adjustment.
To produce a more reliable reference image, we initially created two adjusted versions for each method (i.e., over-adjustment and under-adjustment), then selected the better one as the potential label.Consequently, we obtained four reference images (including the original label) for each original underwater image, reflecting the uncertainty inherent in the groundtruth recording process.

Experiments
In this section, we conducted comprehensive experimental evaluation of the proposed method.At first, we describe the implementation details and the validation dataset.Secondly, we introduce the evaluation criteria and compare our method with eight state-ofthe-art UIE methods on the UIEBD dataset in terms of both qualitative and quantitative evaluations.Finally, we evaluate the effectiveness of the key components in our proposed method through the ablation study.

Implementation Details
Our method was implemented in Pytorch and the NUAM-Net model was trained on an NVIDIA RTX 4090 GPU (Santa Clara, CA, USA) with ADAM optimizer, where the learning rate was 1×10 −4 , the number of training epochs was 500, the batch size was 1, and the image was resized to a resolution of 256 × 256.During the training, we performed random rotations and horizontal-vertical flips for data augmentation.

Datasets
We validate our method on the popular UIEBD benchmark, and we followed a previous work [8] in utilizing the first 700 original samples for training and the remaining 190 images for testing.

Performance Criteria
To evaluate the enhancement performance of our method, we employ SSIM (Structural Similarity) [53], PSNR (Peak Signal-to-Noise Ratio) [54], and MOS (Mean Opinion Score) [55] metrics.SSIM and PSNR are full-reference metrics computed based on the manually selected well-enhanced reference image (label image) in UIEBD to ensure a fair comparison with existing methods.Additionally, we conduct subjective testing to understand user preferences for the results generated by each UIE method.We use MOS to quantify subjective evaluations.We invited 10 participants (5 males and 5 females) to participate in the subjective testing.Original and enhanced underwater images were simultaneously displayed on the screen.Subjective ratings for each image were assessed on a three-level scale according to the following criteria: 3 (excellent), 2 (fair), 1 (poor).Evaluation metrics include color distortion, contrast enhancement, naturalness preservation, brightness improvement, and artifact suppression.

Comparison Methods
We compared NUAM-Net with eight UIE methods, including two model-free methods (GC, Retinex), one popular prior-based method (DCP), three state-of-the-art deep learning methods (Deep-SESR, Water-Net, Ucolor), and two advanced probabilistic network-based methods (PUIE-MC, PUIE-MP).We report the results of all compared methods using the original implementations provided by their authors in the same experiments to ensure fairness of comparison.

Results
Table 2 summarizes the quantitative comparison results on the UIEBD dataset.It can be seen that our NUAM-Net achieves highly competitive performance and outperforms other methods in PSNR by a significant margin.Specifically, prior-based methods obtain relatively poor results because they heavily rely on prior knowledge-driven approximate imaging models, limiting their generalization ability to more complex scenarios.We found that the performance of deep learning methods is significantly better than handcrafted methods and our NUAM-Net achieves the best results, showing the effectiveness of the proposed method.We further present qualitative comparison results in Figure 7.It can be observed that, although most methods can enhance contrast to some extent, serious visual defects still exist due to undesirable color adjustments or artifacts.For example, GC and Retinex exhibit unnatural color saturation and blurred image details.Prior-based methods can improve contrast, but color is severely degraded in these cases.Water-Net and Ucolor often produce low-quality results.Due to the enriched color perception and long-range interaction, our method performs well in all these cases and produces consistently cleaner visual quality and more natural fine textures than the state-of-the-art PUIE-Net.

Ablation
We conducted ablation experiments on our network, evaluating the performance of some variants of the proposed method with the backbone, backbone+LAB, and back-bone+LAB+ AUEM.As shown in Table 3, enriching the color perception by extracting features from the RGB space and wider color-represented LAB space leads to reasonable improvements in the PSNR metric.With the well-designed AUEM, modeling the longrange spatial and channel interactions from both local and long-range receptive fields, the backbone+LAB+AUEM variant further promotes the PSNR score by 0.57.The qualitative results in Figure 8 also show the gains of our proposed method.
We compare the features extracted by the our network and the backbone network.In Figure 9, the higher the number of green flashing dots in the renderings, the higher the number of features that have been extracted.It is obvious that our network can extract more features.We prove that our network structure has some advantages.We compare two modules: (1) AUEM without LKA, SG, and CA and (2) AUEM.The results are shown in Table 4. Furthermore, we set an extra experiment to prove that our network's structure has advantages (to some extent).We replace the AUEM module by the same capacity convolution blocks, and we compare NUAM-Net and conv blocks Net.The results are shown in Table 5.We believe that the advantage of our network structure lies in its ability to fuse multiscale spatial and channel features, as well as the additional color domain that can provide more information.

Discussion
Based on our experimental results, the performance of our network is remarkably outstanding.We attribute this success primarily to the incorporation of AUEM and the physical priors embedded in LAB color space images.Through ablation experiments, it becomes evident that the most influential factor is AUEM.This module significantly expands the network's receptive field and enhances channel-wise and spatial interactions to a considerable extent.As a result, it plays a vital role in achieving such impressive performance.This finding holds significant implications for addressing the enhancement tasks of underwater images in current probabilistic networks, serving as valuable inspiration for future research in this domain.

Conclusions
In this paper, aiming to address the local perception and color perception limitations of current UIE methods, we proposed NUAM-Net for underwater image enhancement.Specifically, our NUAM-Net models the long-range spatial and channel interactions with a novel AUEM module, enabling both local and long-range receptive fields for large-scale degradation perception.Moreover, NUAM-Net extracts features from RGB and an extra LAB color space to fully utilize the fine-grained color degradation clues of underwater images.Based on the probabilistic training framework, our NUAM-Net achieves highly competitive results on the popular UIEBD benchmark compared to the state-of-the-art model-free, prior-based, and learning-based UIE methods.In the future, we plan to extend our method to vision-based underwater systems, such as underwater visual SLAM and visual 3D reconstruction.

Figure 1 .
Figure 1.Illustration of uncertainty issue in UIE learning.We show examples of UIEBD datasets, i.e., the original image, (a) selected reference, (b) contrast adjustment result, (c) saturation adjustment result, and (d) gamma correction result.Multiple potential solutions can be ambiguous in reference selection since different people might choose different labels as the reference.

Figure 2 .
Figure 2. The network architecture of NUAM-Net.It consists of the feature extractor, PAdaIN, AUEM, and the output blocks.The extractor's architecture is similar to the U-Net.

Figure 3 .
Figure 3.The overview of the AUEM.It consisted of a conv block and AIEM block.In the AIEM block, we try to combine and enhance the probabilistic feature.AIEM includes PConv, DWConv, LKA, SG, and IMAConv, which are five types of convolution blocks.HIE employs three parallel operations: LKA, SG, and CA for feature extraction.Large-Kernel Attention, which is shown in Figure 4a, decomposes the feature extraction into three types of convolution: Depthwise Convolution (DW-conv), Depthwise Dilated Convolution (DW-D-Conv), and Point Convolution.DW-Conv is a 55 convolution, and DW-D-Conv is a 55 convolution with a dilation rate of 3. Point convolution is a 1 × 1 convolution.DW-Conv processes local structural information, DW-D-Conv is used to capture long-range dependencies, and Point Convolution is used for inter-channel interaction.The Simple Gate (SG), which is shown in Figure 4b, divides the features along the channel dimension into two parts, decomposing f ∈ R C×H×W into f 1 ∈ R C 2 ×H×W and f 2 ∈ R C 2 ×H×W .Then, these two features undergo a wise multiplication (the values at corresponding positions in two features are multiplied) operation.The channel attention-processed feature, which is shown in Figure4c, f ∈ R C×H×W passes through a channel attention module to obtain f 1 ∈ R 1×H×W .Then, it passes through a 1 × 1 convolution, followed by a ReLU activation function, and, subsequently, through another 1 × 1 convolution with a Sigmoid activation function to produce feature f 2 ∈ R 1×H×W .Finally, a wise multiplication operation is performed between f 2 and the original feature f to obtain the final feature.

Figure 5 .
Figure 5. Structures of IMAConv used in our AUEM module.

Figure 6 .
Figure 6.Examples of the extended UIEBD dataset, including 4 labels.Label-1 denotes the manually selected label in the original UIEBD dataset, label-2 is the contrast adjustment result, label-3 is the saturation adjustment result, and label-4 is the gamma correction result.

Figure 8 .
Figure 8. Enhancement examples of our ablation studies.We show the enhanced images of backbone, backbone+LAB, and backbone+LAB+AUEM on a subset of the UIEBD test data.It is evident from the image that our network demonstrates significant improvement in enhancement effectiveness.

Figure 9 .
Figure 9. Pictures show two extracted results of backbone and our network.(a) represents the feature extracted by our network and (b) represents the feature extracted by backbone network.

Table 1 .
Structure of SE-ResNet50, consisting of three main blocks.

Table 2 .
Quantitative results on the UIEBD test dataset.We report the metrics of PSNR, SSIM, and MOS values for evaluation.Higher values indicate better performance.The best results are highlighted in red.
* is our network.

Table 3 .
Ablation experiments on the UIEBD test dataset.