Remote Sensing Sea Ice Image Classification Based on Multilevel Feature Fusion and Residual Network

Sea ice disasters are already one of the most serious marine disasters in the Bohai Sea region of our country, which have seriously affected the coastal economic development and residents’ lives. Sea ice classification is an important part of sea ice detection. Hyperspectral imagery and multispectral imagery contain rich spectral information and spatial information and provide important data support for sea ice classification. At present, most sea ice classification methods mainly focus on shallow learning based on spectral features, and the good performance of the deep learning method in remote sensing image classification provides a new idea for sea ice classification. However, the level of deep learning is limited due to the influence of input size in sea ice image classification, and the deep features in the image cannot be fully mined, which affects the further improvement of sea ice classification accuracy. (erefore, this paper proposes an image classification method based on multilevel feature fusion using residual network. First, the PCA method is used to extract the first principal component of the original image, and the residual network is used to deepen the number of network layers.(e FPN, PAN, and SPP modules increase the mining between layer and layer features and merge the features between different layers to further improve the accuracy of sea ice classification. In order to verify the effectiveness of the method in this paper, sea ice classification experiments were performed on the hyperspectral image of Bohai Bay in 2008 and the multispectral image of Bohai Bay in 2020. (e experimental results show that compared with the algorithm with fewer layers of deep learning network, the method proposed in this paper utilizes the idea of residual network to deepen the number of network layers and carries out multilevel feature fusion through FPN, PAN, and SPP modules, which effectively solves the problem of insufficient deep feature extraction and obtains better classification performance.


Introduction
Sea ice disasters are one of the marine disasters that should not be underestimated. ey mostly occur in polar regions and mid-to-high dimensional regions. China's Bohai Bay is located in a mid-to-high dimensional area. e region has a developed economy and heavy maritime traffic. However, the Bohai Bay area has different degrees of icing every winter [1], which affects marine activities such as marine fisheries, marine oil and gas resource development, and other social practices. According to statistics, the economic loss caused by sea ice disasters has reached hundreds of millions, and the affected population has reached tens of thousands. erefore, in order to avoid more economic losses and casualties, it is very necessary to detect sea ice in the Bohai Bay area [2].
Sea ice image classification is an important part of sea ice detection. e remote sensing images currently used for sea ice classification mainly include SAR images and optical images (multispectral images (such as Landsat, Sentinel, MODIS, and so on) and hyperspectral images). Among them, the SAR image contains rich feature information. Its imaging feature is that it can penetrate clouds and is less affected by the environment. However, due to sensor limitations, its data are relatively singular, which is not conducive to the detailed classification of multiple types of sea ice. e optical image has a high spectral resolution, contains rich spectral and spatial information, can extract detailed features of different types of sea ice, and provides effective data support for accurate sea ice classification. e current sea ice classification methods mostly use traditional supervised classification methods, including SVM, decision tree, maximum likelihood, minimum distance, etc. For example, literature [3] is based on the SVM algorithm, which combines the backscatter coefficient, GLCM, and sea ice density to classify the sea ice image. Literature [4] improved the classification accuracy of sea ice images in Liaodong Bay by building decision trees and gray-level co-occurrence matrix (GLCM). Literature [5] used SVM and maximum likelihood method to conduct comparative experiments on Landsat images, in which SVM classification is closer to the actual situation. Literature [6] used the discriminant function to calculate the distance to the center, and the result shows that the minimum distance method can obtain higher classification accuracy when analyzing remote sensing images. e above supervised classification algorithms are all shallow models and cannot extract deep features in hyperspectral and multispectral images, which limits the further improvement of classification accuracy.
In recent years, as deep learning technology has continuously made important progress in the field of image classification, it has gradually gained attention in the field of remote sensing applications. Literature [7] enters the extracted texture features into the 3D-CNN model, makes full use of the spatial spectral characteristics of remote sensing sea ice images, and achieves high classification accuracy. Literature [8] used CNN and DBN for sea ice classification to evaluate the performance of different types of deep learning in SAR image sea ice classification and its influencing factors. Research shows that the use of deep learning can improve the classification accuracy to a certain extent, but in pixel-level classification, the input size of hyperspectral and multispectral images is small, the number of layers of deep learning models is limited, and the possibility of model optimization is less. In 2015, He et al. [9] proposed a deep residual network, which won the image classification and object recognition in the ImageNet largescale visual recognition competition, alleviating the problem of gradient disappearance and solving the problem of insufficient network layers. Literature [10] proposed a model based on spectral-spatial residual network. e spectral residual module and the spatial residual module are designed to extract spectral features and spatial features, respectively, and achieve high classification accuracy on the standard data set, effectively alleviating the problem of network degradation and deepening the number of network layers. Literature [11] is based on embedding two modules of skip connection and covariance pooling in the residual network model, fusing different levels of feature information, and achieving high classification accuracy in the standard dataset. Literature [12] added top-down feature fusion on the basis of feature pyramid networks (FPNs) and improved the loss function. e improved mask R-CNN algorithm improved the classification accuracy of image recognition. Literature [13] proposed a RP-SSD (residual and pyramid SSD) algorithm based on residual network and improved feature pyramid, which significantly improves the detection performance of small targets. Literature [14] fused SPP net and PA net to improve the quality of algorithm feature fusion. e above literature shows that the combination of residual network, feature pyramid, and SPP net is an idea to improve the accuracy of image recognition in recent years. erefore, this paper applies the combination of feature pyramid and residual network in the field of sea ice classification to improve classification accuracy.
Based on the above research, this paper proposes a multilevel feature fusion sea ice classification method based on residual network. We use the residual network to deepen the number of network layers and solve the problem of difficulty in extracting sea ice depth features caused by the small image input size in sea ice detection and the limited number of network layers. e idea of fusing FPN, PAN, and SPP is to perform multilevel fusion of different levels of features extracted from multiple residual blocks, fully mining different levels of depth features, fusing multiscale feature information, fully mining different scale sea ice features in the image, and further improving the accuracy of sea ice classification. e rest of this article is organized as follows. Second 2 introduces the overall framework, theoretical methods, and algorithm ideas of this article in detail. Section 3 introduces the dataset and related experimental settings and discusses the experimental results and the influence of the experimental parameters on the results. e work of this paper is summarized in Section 4.

Theoretical Method
e overall framework of this article is shown in Figure 1. It is divided into three parts in total. e first part is the data preprocessing part, which mainly uses ENVI to make sample labels and stores them in the required format through MATLAB. e second part is the comparison between the algorithm framework of this paper and the three algorithms of SVM, CNN, and traditional residual network. e third part is the accuracy evaluation part, which mainly calculates the confusion matrix to obtain the overall classification accuracy and kappa coefficient. e main idea of the algorithm in this paper is to use the residual network to deepen the number of network layers, alleviate the accuracy drop caused by the excessive number of network layers, and use the three modules of FPN, PAN, and SPP to perform multilevel and multiscale fusion of the extracted features. We make full use of the deep features of the mining, distinguish the types of sea ice more effectively, and solve the problem of limited network layers due to the small input size in sea ice detection. First, perform principal component analysis on the original image, select its first principal component as input and input it into the convolutional layer, then add three residual blocks to deepen the number of network layers, use FPN and PAN to extract features of different layers from the three residual blocks, use multiscale fusion of the features extracted from the residual blocks through upsampling and downsampling, and use the features between different layers in the residual block to improve the utilization of each layer to extract features. Finally, the SPP module is used to pool according to the size of three convolution kernels of different sizes, and the three features are spliced with the features before pooling, making full use of the features extracted from each layer. erefore, by deepening the number of network layers and multiscale fusion, this paper fully excavates the deep features in the original image and improves the classification accuracy of the image.

Principle of Residual
Network. Network depth is a hot topic in deep learning. In theory, the more the network layers are, the better the effect can be achieved. However, the input size of the sea ice image is small because the convolution calculation is a process of multiplying matrix inner product and then adding all the new matrix values to obtain a value. erefore, the input image will be reduced during the convolution process, which leads to the limitation of the number of network layers. So, this paper improves the traditional residual network to deepen the number of network layers. e main idea of the residual network is shown in Figure 2. e identity mapping prevents the error from increasing. is is the key to the residual network to solve the problem of the disappearance of the gradient caused by the increase in the number of network layers in deep learning. We pass the input x to the output as the initial result, and the output result is erefore, the target value of the residual network is the difference between Y and x, that is,

Improved Residual Network.
is paper adds FPN, PAN, and SPP modules based on the residual network. FPN is topdown. e high-level features are merged with the low-level features through upsampling to enhance the utilization of features. PAN is bottom-up. Using the fusion features of FPN, the lower-level features are passed up. e combination of FPN and PAN can fully exploit the features between different layers. e SPP module is proposed to solve the problem that the input of the fully connected layer  requires a fixed data size. Generally, the feature is mapped into several equal parts through maximum pooling. Finally, when the fully connected layer is input, the features are expanded into a one-dimensional matrix, which is spliced in the channel dimension, and the local features and the overall features are merged to enrich the expressive ability of the feature map. e combination of residual network with FPN, PAN, and SPP modules not only solves the problem that the number of network layers is limited by the input size but also makes full use of the features extracted from the residual block, thereby improving the classification accuracy. Figure 3 shows the network structure diagram of the algorithm in this paper. Conv means the convolutional layer, resx means the residual block, and there are three residual blocks. Each residual block contains three convolutional layers and a pooling layer. Ups means upsampling, dws means downsampling, concat means splicing between features, and FC means fully connected layer. We first input the data into the convolutional layer to extract low-level features, input the features into the residual block, then upsample the features extracted from the third residual block, and stitch them with the features extracted from the second residual block. en, the spliced feature is continuously upsampled and then spliced with the feature extracted from the first residual block to obtain a new feature, and the new feature is downsampled and spliced with the feature of the corresponding size. Finally, it is spliced with the third residual block and the new feature and then input into the SPP module; among them, the SPP module has three maximum pooling, and the most obtained three new features are stitched with the input SPP module to obtain a one-dimensional matrix and input to the fully connected layer, and finally the image is classified.

Algorithm Description.
Based on the above algorithm analysis, Algorithm 1 described in this article is as follows.

Experimental Data Description.
e Bohai Bay has always been an important area for studying sea ice conditions in our country. e first experimental data in this paper consist of a hyperspectral sea ice image, which was taken in the Bohai Sea on January 23, 2008. e image size selected in the experiment is 442 × 212, and 176 bands remain after excluding some of the disturbed bands, with a resolution of 30 m, as shown in Figure 4. e second experimental data consist of a multispectral image, downloaded from the official website of the European Space Agency (ESA) and taken in the Bohai Sea on February 11, 2018. e size of the image selected in the experiment is 400 × 400, the number of bands is 13, and the different resolutions in each band are processed to the same resolution of 10 m. e processed image is shown in Figure 5.
e two sets of experimental data in this paper are based on the combination of spectral curves and Google Maps, using pixel points as sample labels. e hyperspectral images are divided into three categories: white ice, sea water, and gray ice. e sample labels are 2363, and the ratio of training samples to test samples is 1 : 9; the specific numbers are shown in Table 1.
e multispectral image is divided into four categories: white ice, sea water, gray ice, and land. ere are a total of 8025 sample labels, and the ratio of training samples to test samples is 1 : 9.
e specific numbers are shown in Table 2. Figures 6 and 7 show the hyperspectral average spectral curve and the multispectral average spectral curve, respectively. e color of the curve corresponds to the label color of Figures 4 and 5. e vertical axis values of the two graphs are quite different, and the categories are easier to distinguish.

Preprocessing and Experimental Settings.
e data preprocessing part uses the PCA algorithm to reduce the dimensionality of the image. e main features of different bands are concentrated in the first band and used as input.
e training sample is random input, and the test sample is the total sample minus the training sample. e random input of the sample makes the accuracy of the network model not completely consistent each time, so all the experiments in this article are trained five times to ensure the stability of the experimental results, making the experimental results more convincing.
Deep learning contains a large number of training parameters, and different parameters have a certain impact on the experimental results. erefore, some parameters are fixed in the experiment process of this article. e learning rate is 0.0005, the discard ratio is 0.5, the batch size is 20, and the number of iterations is 10,000. e other parameters and network settings of the algorithm in this paper are shown in Table 3. ere are three comparison algorithms in this paper, namely, SVM [15], CNN [16], and traditional residual network [17]. e radial basis function used by SVM is used as the sum function. e parameter g and penalty factor c are all obtained after 5 times cross validation. Other parameters of the CNN are shown in Table 4, and other parameters of the traditional residual network are shown in Table 5. Table 6 shows the classification results of Bohai Bay hyperspectral images based on different algorithms in 2008. e final classification results are obtained by training label samples. e algorithm in this paper has achieved the best classification results with an overall classification accuracy of 93.01% and a kappa coefficient of 88.87%. Table 7 shows the classification results of Bohai Bay multispectral images based on different algorithms in 2018. e algorithm in this paper has achieved the best classification results, with an overall classification accuracy of 90.41% and a kappa coefficient of 87.52%. In the experiment, SVM is traditional machine learning, and the extracted features are shallow features. e deep features in hyperspectral images are not fully utilized, so the classification accuracy is low. e CNN is limited in the number of network layers of the CNN model due to the small input sample size of the hyperspectral image and cannot fully extract the deep features of the hyperspectral image. Because the traditional residual network deepens the network level, it can further extract the deep features in the hyperspectral image and obtain higher accuracy, but because it only inputs the features extracted from each layer to the residual unit of the next layer, the features between different layers are not deeply fused and mined, which limits the further improvement of the classification accuracy of hyperspectral images. e method in this paper uses the residual network to increase the depth of the network and uses its shortcut connection to solve the problem of gradient disappearance caused by the deepening of the network; at the same time, the FPN and PAN modules are used to reuse the features between layers. Finally, the SPP module is used to generate the output of the same size, which not only solves the problem of the limited number of network layers but also fully excavates the depth feature information in the hyperspectral image to further improve the classification accuracy, thus obtaining the highest classification accuracy.

Analysis of Results.
In the experiments in this article, it can be seen from Figures 6 and 7 that the value of the spectral curve of seawater is the lowest among all categories, and the reflectance of sea ice is clearly distinguished from other categories of sea ice, and a higher classification accuracy is obtained. However, white ice and gray ice are divided according to the thickness of sea ice. As two different types of sea ice, it is classified between the same type of sea Training phase: (4) Randomly input the training samples into the first convolutional layer according to the iterative batch, and the size of the convolution kernel is k × k; (5) Use the output in step (4) as the input in the first residual block. e residual block contains three residual units and a pooling layer. e size of the convolution kernel in each layer is k × k. e number of convolution kernels is m; (6) Take the output of each layer as the input of the next residual block in turn, and there are three residual blocks in total; (7) After upsampling the output of the third residual block in step (6) to expand the feature size (the output result is called feature S), it is spliced with the output of the second residual block to obtain a new feature ST; (8) After upsampling feature ST to expand the feature size (the output result is called feature STR), it is spliced with the output of the first residual block to obtain a new feature SF; (9) After downsampling feature SF to reduce the feature size, splice with feature STR to obtain a new feature SFI; (10) After downsampling feature SFI to reduce the size, splice with feature SS to obtain a new feature SSI; (11) Input feature SSI into the SPP module, where the SPP module contains pooling of different convolution kernel sizes, and the step size is set to f; (12) Splicing the output of different pooling in SPP with feature SSI to obtain a new feature SSE; (13) Input feature SSSSSSS to the fully connected layer; (14) Iterate the model until convergence; (15) Model training completed; Testing phase: (16) Input the test sample into the trained model, calculate the confusion matrix, and get the classification accuracy; (17) e test is completed.
Output: overall classification accuracy, kappa coefficient, and confusion matrix End ALGORITHM 1:Improve residual network algorithm.

Mathematical Problems in Engineering 5
Grey ice Seawater White ice  Mathematical Problems in Engineering ice, more affected factors, and the misstorming is more serious. It can be seen from Tables 6 and 7 that the SVM method is shallow learning, and the classification accuracy of white ice and gray ice is low. e CNN can extract sea ice feature information in depth and obtain a certain improvement effect. e residual network further improves the classification accuracy of white ice and gray ice by deepening the network level. e method proposed in this paper uses the improved residual network to approach the gradient disappearance problem on the basis of deepening the network hierarchy and makes full use of the multilevel and multiscale features of remote sensing sea ice through FPN, PAN, and SPP modules to obtain the best classification effect. Compared with SVM, CNN, and traditional residual network methods, in hyperspectral images, the classification accuracy of white ice has increased by 1.47%, 3.47%, and 9.92%, respectively, and the classification accuracy of gray ice has increased by 11.06%, 11.47%, and 12.63%. In the multispectral image, the classification accuracy of white ice has increased by 8.18%, 8.36%, and 10.21%, and the classification accuracy of gray ice has increased by 6.07%, 7.86%, and 11.49%.     Kernel_size_num Stride Activation

e Effect of Training Sample Size on Experimental
Results. Taking into account the local characteristics of the distribution of sea ice categories, for each pixel, within a certain range, adjacent pixels in its spatial neighborhood belong to the same category with a high probability. erefore, in this experiment, we take the pixel as the center and take the M × M neighborhood, and all the pixels in the neighborhood will form a data block of size M × M × B (where M × M is the size and B is the number of bands). As the training sample of the pixel, the category of the pixel is the category of the training sample. e training sample size will affect the classification accuracy of sea ice; therefore, the selection of the training sample size takes into account the spatial information contained in the sample and the errors caused by it. e larger the size of the training sample, the relatively more the spatial information will be contained, which can increase the depth of the convolutional network and mine more feature information. However, because the surrounding samples may not belong to this category, it will also bring some errors. Taking the above factors into consideration, choosing an appropriate training sample size will obtain better classification results. erefore, this section mainly discusses the comparative analysis experiments of sea ice classification with three sample sizes of 29 × 29, 27 × 27, and 25 × 25. It can be seen from Tables 8 and 9 that when the training sample size is 27 × 27, the overall classification accuracy and kappa coefficient are the highest. erefore, the training sample size of 27 × 27 is selected for the two different images in this article.

e Influence of the Number of Convolution Kernels on
Experimental Results. In deep learning, the more the convolution kernels are, the more the parameters will be and the more the features will be extracted, which will improve the classification accuracy to a certain extent. However, if the number of convolution kernels is too large, it will cause overfitting problems and affect the improvement of sea ice classification accuracy. In addition, the more convolutionary cores, the parameters of participating operations are increased, and the computational complexity is higher, and the cost of time consumption is more, so choosing the appropriate number of convolution kernels is the focus of this section. In the experiment, we set the candidate value of the number of convolution kernels according to the empirical value (generally, the number of convolution kernels is 16, and a value of about 16 is taken as the candidate value), and through experimental analysis to determine the optimal number of convolution kernels, 4,8,16, and 32 convolution kernels were set in the experiment. In this section, the number of convolution kernels in the first layer of convolution is discussed. e number of remaining layers is increased by a factor of 2. e optimal parameters are selected for the hyperspectral image and multispectral image of Bohai Bay. It can be seen from Table 10 that when the convolution kernel is 8, the overall classification accuracy and kappa coefficient are the highest. erefore, the number of convolution kernels for the first layer of hyperspectral image convolution is set to 8. It can be seen from Table 11 that when the convolution kernel is 16, the overall classification accuracy and kappa coefficient are the highest, so the number of convolution kernels for the first layer convolution of the multispectral image is set to 16.

e Effect of Convolution Kernel Size on Experimental
Results. e size of the convolution kernel in deep learning is also an important parameter that affects accuracy. In general, the larger the convolution kernel, the larger the receptive field and the more the information contained in the receptive field. More features can be extracted to further improve the classification accuracy. However, a larger convolution kernel will increase the amount of calculation and reduce the depth of the model, which will affect the improvement of classification accuracy.
is section discusses the influence of the convolution kernel size on the experimental results. Four convolution kernel sizes of 2 × 2, 3 × 3, 5 × 5, and 7 × 7 are selected for experiments. It can be seen from Tables 12 and 13 that the classification accuracy and kappa coefficient are the highest when the convolution kernel is 3 × 3. erefore, the size of the convolution kernel in the hyperspectral image and multispectral image experiments in this article is set to 3 × 3.

Summary and Outlook
In order to fully mine the deep features in remotely sensed sea ice images, this paper proposes a multilevel feature fusion remote sensing sea ice image classification method based on residual network. e residual unit in the residual network is used for feature extraction, and the number of network layers is deepened through the principle of identity mapping, and the FPN, PAN, and SPP modules are used to fuse the features extracted from different residual blocks to fully excavate the multilevel and multiscale depth feature information in the remotely sensed sea ice data to further improve the accuracy of sea ice classification.
e experimental results show that compared with other learning methods, this method

Kernel_size_num
Stride Activation Mathematical Problems in Engineering obtains the best classification results. e specific summary is as follows: (1) Hyperspectral images and multispectral images contain rich spectral information and spatial information. Traditional machine learning can only extract shallow features and cannot make full use of the influential deep features, which affects the improvement of classification accuracy. However, deep learning technology has obtained better classification results due to its good deep feature extraction ability. (2) e residual network can use its identity mapping characteristics to deepen the number of network layers and solve the problem of limited network layers in image classification due to the small input size of the sample. At the same time, the degradation problem caused by the deepening of the network layer is alleviated. erefore, the residual network can further extract the features of the sea ice image and improve the classification accuracy.
(3) e method in this paper uses the advantages of residual network in deepening the number of network layers and alleviating the degradation        caused by too deep network layers. Secondly, the FPN and PAN modules are combined to connect low-level features and high-level features, which improves the entire feature level and shortens the information path between low-level and high-level features, fully excavates the features extracted by the residual network, and merges the features extracted by the residual block to different degrees to realize the deep feature complementarity between layers. Finally, the SPP module is used to fix the features into a one-dimensional vector and input to the fully connected layer to realize the fusion of local and global features, enrich the information of the final feature map, and further improve the accuracy of image classification.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.

Authors' Contributions
Yanling Han and Yun Zhang conceived and designed the research framework. Pengxia Cui was responsible for data collection and processing. Yanling Han completed the algorithm design and data analysis and is the main author of the manuscript. Pengxia Cui, Yanling Han, and Shuhu Yang contributed to original draft preparation.