Research on Key Techniques for Super-resolution Reconstruction of Satellite Remote Sensing Images of Transmission Lines

The data point target on the satellite remote sensing image of the transmission line is restricted by satellite remote sensing imaging equipment and transmission conditions, which makes it difficult to guarantee the clarity of the transmission line. Image super-resolution technology aims to recover high-resolution images from low-resolution images to improve the detailed information of the transmission line itself, which is of great significance for the intelligent inspection of transmission line satellite remote sensing and hidden danger monitoring. Aiming at the problems of traditional methods relying on multi-frame image sequences and the reconstruction results are too smooth, this paper proposes a single-frame remote sensing image super-resolution method based on the boundary balance generation confrontation network. Experimental results show that this method can provide more high-frequency information, and the reconstruction result is closest to the real image. Compared with neighbor interpolation, bicubic interpolation and other methods based on deep convolutional neural networks, the PSNR of the results in this paper is significantly improved, and effectively enhances the detailed information of the transmission line and the surrounding environment.


Introduction
After more than half a century of development, remote sensing image processing technology has been widely used in resource exploration, environmental monitoring, military reconnaissance and other fields [1][2][3]. The resolution of remote sensing images has an important impact on the quality of image interpretation. Compared with low-resolution images, high-resolution images have a higher pixel density and can provide more tones, shapes and texture information. Hardware methods to improve image resolution often require high cost and high technology, and are difficult to effectively apply in large-scale deployments and in real-time environments. Therefore, a more feasible method is to use 2 digital image processing technology to obtain high-resolution images from low-resolution images. This method is called image super-resolution (SR).
Image super-resolution is a typical ill-conditioned inverse problem. For this problem, scholars in the fields of signal processing and computer vision have proposed a series of solutions. Su and Sun et al. [4][5] reviewed and analyzed this. Existing remote sensing image super-resolution methods can generally be divided into single-frame image super-resolution and multi-frame image super-resolution according to different types of input and output. Among them, due to the lack of correlation information between multiple frames of images, a single frame image is difficult to obtain prior knowledge of image degradation, which has become the focus and difficulty of the image super-resolution method.
In recent years, the method based on example learning [6] has gradually become a research hotspot. This type of method learns the parameter representation of features from the training sample set, and predicts the missing information in the reconstructed image [4]. Specifically, this type of method is based on deep learning technology to establish an end-to-end convolutional neural network structure to learn end-to-end mapping between low-resolution images. Dong et al. [7] introduced the convolutional neural network to the image super-resolution problem for the first time, and interpreted the three convolutional layers in the network as three steps of image block extraction, non-linear mapping and image reconstruction. Ledig et al. [8] believe that the image reconstruction results should be as close as possible to both the low-level pixel values and the high-level overall style, and the peak signal to noise ratio (PSNR) is not suitable as the evaluation criterion for reconstruction results. So a new perceptual loss function is proposed, and the Generative Adversarial Network (GAN) is introduced into the image super-resolution task for the first time. Ber-thelot et al. [9] overcome the shortcomings of GAN training that are difficult to converge, and propose a Boundary Equilibrium Generative Adversarial Network (BEGAN).
Inspired by the above work, this paper proposes an improved single-frame remote sensing image super-resolution reconstruction method. The main work and innovations can be summarized as follows: First of all, as far as known, BEGAN has been adjusted and improved for the first time in superresolution reconstruction of remote sensing images. Secondly, an end-to-end self-encoder structure network is designed. Both the generation network and the discrimination network use this type of structure, which simplifies the network structure design and facilitates code reuse. Finally, unlike traditional GANs that use random vectors as the input of the generation network, the generation network in this paper uses low-resolution images as input to achieve the super-resolution of single-frame remote sensing images with 4 times upsampling.

Generating a confrontation network
The generative confrontation network was proposed by Ian Goodfellow and his students. It is a novel generative model, as shown in Figure 1. GAN is inspired by the two-person zero-sum game in game theory. The two parties of the game in GAN are respectively acted by the Generator Network(G) and Discriminator Network(D). Specifically, the generator G learns the distribution of the training set sample data, and converts the noise vector into a sample similar to the real training set data. The higher the similarity, the better. On the contrary, the discriminator D is a two-class model to estimate the probability that a sample belongs to the real training set sample. When a real sample, D outputs a larger probability, and when G generates a fake sample, D outputs a smaller probability. The training process adopts an unsupervised method, in which G and D are trained against each other to improve their own capabilities. G tries to maximize the probability of D's misjudgment, and D tries to minimize the probability of making mistakes, until the sample generated by G, D can no longer judge Its authenticity (output probability is 1/2), and the training ends at this time.

Boundary balance generation confrontation network
In theory, through GAN's confrontation training, the generated samples can be infinitely close to the real samples and reach the Nash balance [10]. However, imbalances often occur during network training. When the discriminator learns too well, the generator gradient disappears seriously. With this in mind, the Boundary Balanced Generative Adversarial Network (BEGAN) introduces a hyperparameter k, which can balance D and G to stabilize the training process. In addition, unlike GAN, BEGAN does not use the method of estimating the similarity between two distributions. Instead, it estimates the difference between the error distribution of the distribution. If the error of the distribution is small, the distribution should be close. Specifically, BEGAN belongs to the GAN of the self-encoder structure type. It designs the generator as a decoder, and the discriminator is no longer a classification network, but as an encoder, the loss function is based on the reconstruction error of the discriminator.

Methods of this paper
Deep learning performs well on single-frame image super-resolution tasks. Most of the proposed methods use deep convolutional neural networks as the network structure and the mean square error as the loss function to be optimized. There are two main problems when they are applied to the superresolution of single-frame remote sensing images. One is that the network is difficult to train, and the problem of gradient disappearance often occurs. The other is that the reconstruction result is too smooth. To this end, this paper proposes a single-frame remote sensing image super-resolution method based on the boundary balance generation confrontation network. The following is a detailed introduction from the network structure and the objective function setting.

Network structure and parameter setting
Due to the diversity and complexity of remote sensing images, this paper designs a special superresolution neural network structure as shown in Figure 2. In general, this network is a conditional generation confrontation network, in which high-resolution remote sensing images are used as conditional variables to guide the low-resolution image reconstruction process. Both the generator and the discriminator are the same type of end-to-end autoencoder type network. Unlike the generator input in the traditional generative confrontation network which is an arbitrary random vector, the input of the network generator in this paper is a low-resolution remote sensing image magnified 4 times by the bilinear cubic interpolation method. In order to facilitate the use of pre-trained network weights, the autoencoder type network structure design in this paper refers to SegNet [11]. Compared with the original BEGAN, the network structure designed in this paper has more convolutional groups and feature maps. Compared with the original BEGAN, the network structure designed in this paper has more convolutional groups and feature maps. In order to enhance the quality of the generated image and accelerate the network convergence, a "jump connection" method is used between each convolution group and its corresponding deconvolution group.In the setting of other hyperparameters, batch_size is 16, epochs is 100, and the initial learning rate is 0.0001. The learning rate decays every 2000 iterations to 0.95, and the initial value of γ is set to 0.75.

Objective function
In the network training, this paper uses a loss function based on the L1 norm and the reconstruction error of the discriminator. This loss function can not only balance the competitiveness of the generator and the discriminator, but also enhance the visual quality of the generated image.
Assuming that the pixel-wise error of the image obeys the Gaussian distribution, the L1 norm loss function is: In the formula, I HR represents the real high-resolution image, and I LR represents the corresponding low-resolution image. The loss function to be optimized and update rules of the generator and discriminator are: In equations (2) to (6), where x represents a real high-resolution image, y represents a high-resolution image, and z represents a corresponding low-resolution image. L D r and L D f represent the loss of the discriminator with respect to x and y, θ G and θ D represents the relevant parameters of the generator and the discriminator respectively. λ k represents the proportional gain of k, and k t represents the t-th update value of k. The parameter k can balance the competitiveness of the generator and the discriminator. Where γ is the ratio of the expected loss of the generated sample to the expected value of the true sample loss. This parameter can control the trade-off between the diversity of the generated image and the visual quality.

Experiment and analysis
The data set used in this experiment is derived from the NWPU-RESISC45 high-resolution remote sensing map data set. NWPU-RESISC45 includes 45 scene categories such as airports, overpasses, railway stations, residential areas, etc. Each category has 700 images, ensuring the authenticity and diversity of experimental data. In this experiment, 10 images of each category in the original data set are selected, and a total of 450 images form the experimental data set. According to convention, the order of the image samples is randomly shuffled and divided according to the ratio of 8:2, which means that 80% of the samples are used for training and 20% of the samples are used for testing. The experiment in this article uses Python language, Tensorflow r1.4 development framework and computer hardware configuration mainly two Quadro M4000GPU (8GB), 16GB RAM. Before making a detailed quantitative evaluation of the super-resolution reconstruction results of remote sensing images, firstly show the comparison between other representative methods and the super-resolution reconstruction results of the method in this paper. The red rectangle in the figure indicates the region of interest, as shown in Figure 3 and Figure 4.
A total of 160,000 iterations of network training have been performed. Figures 5 and 6 respectively show the changing trends of the network convergence metric Mglobal and the image super-resolution evaluation indicator PSNR during the network training process. It can be seen that the training process is fast and stable. After 80,000 iterations, the PSNR gradually converges to 27.3, and Mglobal finally converges to 0.055. The parameter Mglobal represents the degree of network training. From equation (7), the smaller the Mglobal value, the better the network training.
The mean square error (MSE) is: The peak signal to noise ratior(PSNR) is : The structural similarity (SSIM) is: In formula (8), h, w, and k respectively represent the length, width and channel number of the image. In formula (9), n is equal to 8. In formula (10), l(x,y), c(x,y), s(x,y) represent the brightness, contrast, and structural similarity of the image, respectively. The value range of SSIM is [0,1], the larger the value the image distortion is smaller.
The three indicators in equations (8) to (10) are used to comprehensively evaluate the superresolution reconstruction method proposed in this paper. Table 1 is the comparison of PSNR and SSIM of each method in Figure 3 with respect to the sample image, and Table 2 is the comparison of the reconstruction results of each method on the test set samples. Figure 3, Figure 4 and Table 1 comprehensively compare the reconstruction results of the four images of transmission towers, reservoirs, roads, and buildings in this method and other methods. The results show that this method overcomes the shortcomings of the overall smoothness of the traditional reconstruction results, and the reconstruction results restore more high-frequency details. This is mainly due to the fact that the network in this paper adopts a framework based on a generative confrontation network and no longer uses the mean square error as the loss function. Inspired by BEGAN, the network in this paper uses the hyperparameter γ to balance the diversity and visual quality of the generated images, which achieves the best remote sensing image super-resolution results.

Conclusion
Generative confrontation network can generate realistic images. This article adjusts and improves the boundary balance generation confrontation network to achieve end-to-end single-frame remote sensing image super-resolution. The generator and discriminator of the network in this paper are both selfencoder structure with jump connection, and the generator input is low-resolution remote sensing image. When using the NWPU-RESISC45 dataset for experiments and using PSNR and SSIM as the image super-resolution reconstruction quality evaluation indicators, compared with other super-resolution methods based on convolutional neural networks, the method in this paper is more stable, faster, convergent, and more detailed in reconstruction. It proves the effectiveness and advancement of this method. But it is also noticed that the selection of hyperparameters in network training is still a difficult task. Future research work is mainly to do super-resolution reconstruction for specific types of remote sensing image data, and to combine deep learning-based methods with traditional methods to achieve better super-resolution reconstruction.