Underwater Polarization Imaging Recovery Based on Polarimetric Residual Dense Network

Application of deep-learning to polarization imaging technology for image restoration has led to many technological breakthroughs, especially in underwater image recovery and recognition. In this work, a four-input deep learning model with the Polarimetric Residual Dense Network is proposed for underwater image recovery. The diverse polarization component images are trained and tested in different processes in the network for the recognition and dehazing by considering the physical model of polarization dehazing imaging. Our study reveals that the proposed method can efficiently recover the hazed images, and provide good performance for improving the quality of image restoration even in a high-turbidity complex underwater environment.


I. INTRODUCTION
U NDERWATER imaging technologies have attracted immense attention for accessing underwater environments. However, the light is severely scattered and absorbed due to the presence of turbid particles in the water, resulting in poor underwater imaging quality and lack of image details. To overcome this challenge, a lot of research has been carried out to improve the quality of underwater imaging [1], [2], [3], [4]. Among them, the underwater polarization dehazing technology [5], [6], [7], [8] has been recognized as an effective method to obtain a clear dehazing image, due to the backscattering light being partially polarized [9], [10]. Many attempts have been made to improve the underwater polarization dehazing technology based on the Schechner's proposal by using the relationship between the object radiation and two orthogonal polarization images to realize the underwater image recovery [5], [9], such as the underwater polarimetric dehazing imaging model using the Stokes vectors to get polarization information [11], and the underwater image enhancement by using the wavelength compensation [12].
In the past decade, machine learning and neural network have achieved a variety of successes in many areas including distance prediction [13], image dehazing [14], [15], visible light positioning (VLP) [16], and image recognition [17]. The deep learning method, as a sub-field of machine learning, has been recognized as an effective approach that leads to many technological advancements [18]. This is because of its powerful feature extraction capabilities and feature learning capabilities. Recently, Hu et al. proposed a learning-based polarimetric underwater image recovery method [19] to improve the dehazing effect, and enhance the image quality under a low light environment with a neural network [20]. However, most of the work only identifies the relation of input and the ground truth of the target object by a dense connected neural network based on extracting image structures and features. The governing model for the actual physical process is usually ignored in the neural network, which results in some inefficient efforts and inaccurate results. The lightweight convolutional neural network for underwater polarization dehazing imaging by combining both the advantages of deep learning and polarization dehazing imaging technology is demonstrated in our recent work [21], which can rapidly achieve a better dehazing imaging effect than that of conventional dehazing methods. Nevertheless, the development and improvement of the network model for deep learning-based underwater dehazing imaging is an urgent topic to explore for the applications, especially the special design scheme of the neural network for the guiding learning under considering the governing physical model for the particular case.
In this work, we propose a robust method for underwater dehazing technology by combining the polarization light dehazing technique and deep learning method. Different from the previous work [21], there is no estimation for analytical formula or important parameters in this deep learning model. The mapping relations between the input polarization component images and the groud truth are obtained by the four-input deep learning model. Our proposed method shows a good recovery effect on objects of different materials under turbid conditions. For this method, the four-input polarimetric residual dense network (PRDN) is first developed by combining the polarization imaging technology and the Residual Dense Network (RDN) [22]. The recognition and recovery network model is built by combining the constructed PRDN with the Residual Network (ResNet) [23]. The polarization image datasets are built with the images of four polarization components (0 o , 45 o , 90 o linear polarization and circular polarization) for training and testing in the neural network. The mapping relationship between the target and the polarization image information is effectively constructed by the extracted polarization image information. The recognition and classification training are performed using the polarization image datasets in the neural network, followed by the image restoration training. Considering the governing physical model of polarization light dehazing imaging, the training and testing This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ of the linear polarization component images are conducted in different processes in contrast to that of the circular polarization image in the neural network. Moreover, we also conduct experiments on different materials in thick turbid water. Inspired by the transfer learning technique [24], the PRDN is first pretrained by the entire datasets to get a preliminary model. Then the PRDN is further trained by the classified datasets on the basis of the preliminary model to improve the efficiency of training and testing. The experimental results indicate that our proposed method provides a more effective restoring performance for the dehazing images than that of the other state-of-the-art method.

A. The Network Structure Based on the Polarization Dehazing Model
According to the classic underwater degradation model [5] , [7] , [9], the intensity image obtained by the polarization imaging include the target signal and the background scattered light. The target signal is the target irradiance after being absorbed and scattered by suspended particles in turbid waters. Therefore, the underwater imaging degradation is attributed to the background scattered veiling and signal attenuation. Due to the uniqueness and differences of polarization information between the target signal and the scattered light, the polarization characteristics in a scattered light field can be used to remove background scattered light and restore the object images. Currently, the polarization dehazing technology has proved the obvious advantages in achieving clear underwater images and aiding underwater target detection and recognition. The accurate estimation of the polarization characteristics and relationship between the target light and the background scattered light is the essential problem to be resolved in underwater dehazing technology. In this work, the core idea of the proposed neural network is to make full use of the polarization image information to perform the image restoration operations based on the polarization light dehazing technology by combining the recognition and the recovery networks. The different linear polarization images (the 0 0 , 45 0 , 90 0 linear polarization) can be adopted to map the relation between the veiling light and the target light by deeply mining the differences and uniqueness in polarization information in a scattered light field. On the other hand, the circularly polarized light possesses the memory property (i.e., the persistence of circular polarization in scattering environments) [25]. Thus, the circular polarization images assist in the enhancement of imaging recovery. Therefore, the different polarization component images (the 0 o , 45 o , 90 o linear polarization and circular polarization components) are conducted in different processes for the recognition and dehazing in this proposed network.
The overall network framework is shown in Fig. 1. The entire network framework consists of the ResNet34 [23] model and the PRDN model. Here, we employ the ResNet34 model to identify and classify different categories of material targets, whereas the PRDN is used to recover the image of targets. In this network framework, the ResNet34 model has four inputs, comprising the linear polarized (0 o , 45 o , 90°components) and circular polarized images, as shown in Fig. 1. The four inputs with different polarization component images are identified and classified into the corresponding categories (the three categories of material datasets with plastic, metal and plaster are adopted to classify in this work) by ResNet34. The classification is conducive to the efficiency of subsequent training and testing with a smaller dataset through the PRDN. In order to improve the efficiency of training and testing, the PRDN network is firstly pretrained by the entire datasets to get a preliminary model and then further trained by the classified datasets to obtain the different types PRDN.

B. The Polarimetric Residual Dense Network
The Polarimetric Residual Dense Network (PRDN) with four inputs after ResNet34 is developed to improve the dehazing imaging quality, as shown in Fig. 2. The PRDN is mainly composed of five parts: the shallow feature extraction (SFE), the residual dense blocks (RDB), the local polarization feature fusion (LPFF), the global polarization feature fusion (GPFF) and the dense polarization feature fusion (DPFF) [26]. In Fig. 2, the I 0 • , I 45 • , I 90 • andI Cr denote the images correspond to 0 0 , 45 0 , 90 0 linear polarization and circular polarization components, respectively. Here, the circular polarization image (I Cr ) information is extracted directly for the global residual learning, whereas the feature information of three linear polarization images (I 0 • , I 45 • , I 90 • ) is extracted as the input of the RDB for the fusion of the local polarization features. The different color lines in the figure show the data transmission between different layers, and the dotted lines show the data transmission between the RDBs. The SFE layer consists of two convolutional layers: the first layer is applied to extract image features from the four polarization images, and the second layer employs the information of the first layer to further extract polarization feature information for the global residual learning and provides the inputs to the RDB. Thus, we can have where H SF E (·) is the operation of the SFE layer and F GRL denotes the global residual learning of the PRDN for the image of circular polarization. F 0 is used as inputs to RDB. The relationship between the RDBs is described by where   feature fusion (LPFF) process by the concatenation layers, as shown in Fig. 2. The output of the LPFF (F LPFF ) is followed by the noise reduction processing through the two convolutions operation to obtain the F GFF . The F GFF can be expressed as where H GF F is the composite operation of 1 × 1 and 3 × 3 convolutions. Then F GFF is further fused with the global residual learning (F GRL ) from the circular polarization component images in the global polarization feature fusion (GPFF) to enhance the imaging restoration for the image details that may lost during the signal processing. The GPFF can be represented as where the F DFF is the output of the fusion operation of F GFF and F GRL . Finally, the F DFF can be used as the output of the final recovery image after two convolution processing. The output of PRDN can be summarized as where H PRDN is the global function of our PRDN. Fig. 3 illustrates the architecture of the RDB [26] which is an important constituent part of the PRDN. The RDB is made up of numerous convolution layers and rectified linear unit (ReLU) [27] for various operations including series and dense connection, feature fusion and residual learning. These layer-bylayer operations constitute a contiguous memory mechanism by passing the state of the previous layers to the current layer. The contiguous memory mechanism directly connects the outputs of the previous layers with the next layer makes full use of the hierarchical features of the convolution layers [26] with both the feed-forward property and local dense features.
The polarimetric image recovery method proposed by Schechner is based on the difference and relation of polarization information between the direct transmission and backscattered light by two orthogonally linear polarization component images. The object images can be restored by removing the effects of backscattering and absorption through the uniqueness and differences of polarization information between the target signal and the scattered light [9]. Therefore, in the proposed network, the feature information of three linear polarization images (I 0 • , I 45 • , I 90 • ) is extracted as the input of the RDB for learning the mapping function related to the ground truth image in the RDB, respectively. That is, the structure of the RDB and the local polarization feature fusion (LPFF) for the three linear polarization images (I 0 • , I 45 • , I 90 • ) reflects the traditional polarimetric image restoration method based on the linearly polarized light. On the other hand, because of the memory property of circularly polarized light [25], the circularly polarized light tends to maintain its original polarization property better than the linearly polarized light. Therefore, the circularly polarized light is fused in the global polarization feature fusion stage (GPFF) to enhance the imaging restoration for the image details that may be lost during the signal processing. In addition, if any of the three linear polarization images (I 0 • , I 45 • , I 90 • ) and the circularly polarized image are exchanged as the inputs of the RDB in the proposed model, the experimental results indicate that the restored images are degraded correspondingly. These results further verify the validity of this proposed model.

C. Implementation Details
The active polarization imaging experiments are performed in the real underwater environment to obtain the polarization image dataset. The experimental setup is shown in Fig. 4. A 532 nm blue-green laser is used as the active light source, and a CMOS camera (MER2-301-125U3M) is employed to receive the reflected light from the target objective. The polarizer and quarter wave plate are used to convert the laser beam to 0 0 , 45 0 , 90 0 linearly and circularly polarization components. The azimuthal stepper motor is utilized to control the azimuthal angles of polarizer for capturing images with different polarization components. The mean squared error (MSE) is adopted as the loss function during the training of the PRDN. An Adam optimizer [28] is adopted for accelerating the gradient descent algorithm by setting the initial learning rate as 1×10 -4 with the exponential decay rate as 0.6. The NVIDIA RTX 3070 GPU is used to train the model. The number of RDB blocks is chosen as n = 4 which is the overall best parameter based on the performance of the PRDN and the speed of training according to the actual experiment results.The dataset consists of 180 sets of polarized images with different turbidity, randomly divided into training, validation, and test sets in a ratio of 8:1:1. The scale of the dataset is about 8772, where the data size is 200 × 200 pixels.

III. RESULTS AND DISCUSSION
Here, the three categories of material datasets (plastic, metal and plaster) are adopted to classify in this work. The plastic is the polyvinyl chloride (PVC) material, the stainless steel ruler made of iron-chromium alloy is taken as the metal material, and the plaster is mainly composed of CaSO 4 . The target objects with the same material are classified into one category by the Resnet34 model. Due to the different polarization information of the reflected light from objects of different materials, the convergence value and the end position of the gradient descent are different when different objects are trained by the PRDN, resulting in different iteration epochs for training different types of objects. For example, PVC plastic is a high depolarization object compared to the stainless steel, so the PVC plastic needs more iterations to reach the convergence than that of stainless steel ruler during the training. In this work, the training epoch is first adopted as 300 based on a batch size of 2 to get the epoch values with different materials for the PRDN model convergence. When the iteration reaches 120 epochs the PRDN model for the plastic objects converges, whereas at least 260 epochs are required for the metal object PRDN convergence, and around 190 epochs for the plaster object PRDN. This situation leads to the fact that if these objects are trained together in a PRDN, the iteration training needs at least 260 epochs (metal objects), which lead to overfitting for the plaster and plastic objects. In order to achieve a good training efficiency and recovery effect, 3 PRDNs are respectively trained with different materials after classification. The training epoch for the plastic object PRDN model is set as 150, the plaster material PRDN model is set as 220, and the metal material PRDN model is 280. The training efficiency is greatly improved and the overfitting is prevented when 3 PRDNs are trained with different types materials. Therefore, the training on different types of objects facilitates the process to identify an optimal solution for the recovery of certain materials. Since different categories of material targets are identified and classified based on the polarization characteristics of the reflected light of different materials in the ResNet34 model, therefore, the targets will be classified into one category if the polarization characteristics of the reflected light are similar. For example, the experimental results show that untrained materials such as copper and gold can also be classified as "metal". However, the wood can not be classified into any of the three category materials (plastic, metal and plaster) because the polarization characteristics of the reflected light from the wood are significantly different from that of the three category materials (plastic, metal and plaster). Therefore, in practice, an additional category model such as the wood model can be added to the ResNet34 network, and the wood model should be trained with the corresponding dataset in PRDN before the application in dehazing of these objects. In addition, the objects of the same family of material with different shapes such as plastic stickers, plastic badges, and plastic toys have been trained in this proposed method. The different shape objects of the same material trained in the model can prevent the model from using a single type of feature to classify the material of the target object, and inhibit the model from having a restoration effect only on specific objects, thereby improving the generalization ability of the model.
The image recovery results of the target objects with a plastic cube, a metal ruler and a plaster sculpture as different materials in a turbid water environment with different dehazing methods are shown in Fig. 5 . It can be seen that the traditional intensity images by a CMOS for the plastic, metal and plaster targets in the turbid water environment are indistinct as shown in plots (I) in Fig. 5. However, the image details become clear, and the outline of the entire images can be clearly identified after the image is restored by the PRDN network model as shown in plots (V) in Fig. 5. In addition, compared to the image recovery results by using different dehazing methods: the polarization dehazing [29] (see plots (II) in Fig. 5), the CNN polarization dehazing [21] (see plots (III) in Fig. 5) and the dark channel [30] (see plots (IV) in Fig. 5), it is obvious that the image quality restored by our proposed PRDN model is clearer and closer to the ground truth image in a clear underwater environment (see plots (VI) in Fig. 5).
To further verify the efficiency of our proposed network, seven objective evaluation indicators for measuring the imaging quality based on the "full-reference" (FR) and "no-reference" (NR) [31] models are tested and the obtained values are compared with different dehazing methods. Among the objective image quality assessment of the FR is the Peak-Signal to Noise Ratio (PSNR) [32], Information Fidelity Criterion (IFC) [33], Structure Similarity (SSIM) [34] and feature similarity index measure (FSIM) [35]. On the other hand, the assessment indexes for the NR model includes the Enhancement Measure Evaluation (EME) [36], contrast(C) [37] and information entropy(E) [38]. The evaluation results of the seven objective evaluation indicators of the recovery images by different methods are compared in Table I. Our results reveal that the image quality is better with the higher value of indicator regardless of image quality assessment models (the FR or NR). Compared with the NR model, the FR index reflect the quality of the restored image with reference to the ground truth image. Noting that the FR index of the ground truth images cannot be provided in Table I since the FR index is obtained by comparing the recovered target images with the ground truth image. On the other hand, the NR model completely breaks away from the reliance on the ground truth image since it is based on the statistical characteristics of the image to make a rough estimate, and the index of any image including the ground truth image can be computed. It can be seen in Table I that the 4 FR indicators of the image restored by the PRDN give the highest values among the 5 methods and the 3 NR indicators are closest to that of the ground truth (the image in clear underwater serviced as the ground truth), which further evidence the significance of the PRDN method.
In addition, the FSIM and PSNR values as a function of epochs are compared with the polarization dataset for the different network models including the AOD [39], RDN and PRDN, as shown in Fig. 6. The FSIM and PSNR values of all methods tend to stabilize as the number of training epoch reaches a certain   TABLE) AND PLASTER SCULPTURE  (LOWER TABLE) value. However, the PRDN method shows higher FSIM and PSNR values than that of the AOD and RDN methods, further confirming the superiority of the method.

IV. CONCLUSION
In this work, we combine underwater polarization imaging technology with deep learning technique to build a four-input Polarimetric Residual Dense Network (PRDN). An integrated model of recognition, classification and restoration is constructed by combining the PRDN and residual image recognition network (ResNet34). The four inputs of different polarization component images are firstly identified and classified into respective categories by the ResNet34. The mapping relationship between the polarization image information and the target object is conducted in the subsequent PRDN. Using the preliminary model greatly shortens the training convergence time and improve the accuracy of the restored image. The experimental results indicate that this method has a good recovery effect on objects of different materials in a turbid underwater environment.