Single Space Object Image Denoising and Super-Resolution Reconstructing Using Deep Convolutional Networks

Space object recognition is the basis of space attack and defense confrontation. High-quality space object images are very important for space object recognition. Because of the large number of cosmic rays in the space environment and the inadequacy of optical lenses and detectors on satellites to support high-resolution imaging, most of the images obtained are blurred and contain a lot of cosmic-ray noise. So, denoising methods and super-resolution methods are two effective ways to reconstruct high-quality space object images. However, most super-resolution methods could only reconstruct the lost details of low spatial resolution images, but could not remove noise. On the other hand, most denoising methods especially cosmic-ray denoising methods could not reconstruct high-resolution details. So in this paper, a deep convolutional neural network (CNN)-based single space object image denoising and super-resolution reconstruction method is presented. The noise is removed and the lost details of the low spatial resolution image are well reconstructed based on one very deep CNN-based network, which combines global residual learning and local residual learning. Based on a dataset of satellite images, experimental results demonstrate the feasibility of our proposed method in enhancing the spatial resolution and removing the noise of the space objects images.


Introduction
Space objects (SO) refer to objects in space, including satellites, space debris, cosmic stars, etc. Object recognition, orbit determination, position estimation and other researches based on SO are becoming increasingly important. These researches are the basis for entering space, understanding space and controlling space. These researches are also indispensable parts of space attack and defense. SO recognition, especially, satellite recognition and surveillance is the basis of space attack and defense confrontation. The morphological characteristics are one of the important features of SO. Therefore, the geometric shape and texture features are important for SO recognition, orbit estimation, satellite attitude, and state judgment [1,2]. That means high resolution (HR) space object images with less noise are very important to be obtained.
In SO research areas, software solutions used to obtain HR images for three reasons. First, due to the limitations on sensor technology and high costs, it is very difficult to obtain HR images easily. Second, updating imaging devices is very difficult when satellites are launched. Third, the space environment is very complicated. So, denoising and super-resolution (SR) reconstruction technique, are the high-frequency part of SO images while most of the other information between the LR image and HR image is the same, residual learning is a very good algorithm to implement to solve this problem [21]. We use one deep CNN-based network, which combines global residual learning and local residual learning to solve both denoising and SR problem. The HR results with cosmic-ray noise well removed could be finally generated by this network. By the way, our method could handle three scale factors in solving the SR problem. This could reduce the number of parameters three-fold.
In summary, our contribution is divided into the following two aspects: 1. We propose a method named SO-EVDSR. This method could remove cosmic-ray noise and enhance the spatial resolution of SO images in the meantime. It is the first time that a denoising and SR reconstruction method based on only one very deep convolutional network implemented in the SO images research field. 2. This method combines the global residual learning and local residual learning. Experimental results show that our method performs better than several typical methods including some state-of-the-art methods in both quantitative measurements and visual effect.
The rest of this paper is organized as follows: in Section 2, we introduce and analyze some related works in denoising and solving SR problems, In Section 3, we give detailed descriptions of the proposed method. In Section 4, the detail of the experiment and the results are reported. In Section 5, some discussions are shown. Conclusion are drawn in Section 6.

Cosmic-Ray and Denoising
As mentioned above, cosmic-ray noise could be regarded as salt and pepper noise, whose gray value is obviously higher or lower than surrounding pixels. But this type of noise is different from typical salt and pepper noise because its area is usually larger than one pixel, just like Figure 1 illustrated. The area whose gray value is obviously higher than surrounding pixels are cosmic-ray noise. The early method to remove cosmic-ray noise in SO images is to take multiple images in the same field of view, just like Windhorst R. A. et al. [23] proposed. Then the cosmic-ray noise would be located based on sequential image information. Finally, a correct gray value is used to replace the gray value of the pixel contaminated by cosmic-ray noise. This type of method requires multiple images from the same field of view, which could not handle cosmic-ray noise in a single image.
Median filtering is one of the widely used classical methods to remove the cosmic-ray noise in a single SO image. But this method is no longer a very good method because it can further blur the edge of the SO image, which is not good for SR problem.
There are also some other conventional methods to remove the cosmic-ray noise in a single SO image. Zhu Z. et al. [24] first, make the first-order difference in two directions of the image, and the difference result is compared with the threshold to distinguish noise and candidate noise. Then, a Bessel curve fitting method is used to calculate the deviation of candidate noise. Finally, cosmic-ray noise is identified. The drawback is this model is that it is so simple that the applicability of this method is not good. Dokkum P. G. [25] uses a Laplacian edge detection method to detect cosmic-ray noise. This method achieves a very good effect in removing cosmic-ray noise. However, it does not consider the presence of bright supersaturated space objects. To reduce the running time, Pych. W. [26] proposed a fast removing cosmic-ray noise method based on image histogram. This method does not need to model objects, nor does it need a high signal-to-noise ratio. However, the drawback of this method is spectral energy in some pixels would be eliminated incorrectly.
Recently, CNN-based methods have achieved better results than conventional methods in the computer vision area [27,28]. So, some CNN-based denoising methods are appearing constant. Because cosmic-ray noise belongs to a type of salt and pepper noise, let us discuss some CNN-based denoising methods in removing salt and pepper noise.
The first CNN-based method to solve the denoising problem was proposed by Jain et al. [29]. This method achieved similar or even better results compared with other conventional methods. Mao et al. [30] proposed an auto-encoders with symmetric skip connections network. This method implemented 10 pairs of symmetrical convolutional and deconvolutional layers, whose first five layers are coding layer and the last five layers are decoding layer. From this method, the CNN-based network for image denoising became deeper and deeper. Zhang et al. [31] proposed DnCNN combined batch-normalization and residual learning and achieved state-of-the-art results.

Single Image Super-Resolution
Like the denoising problem, CNN-based methods are becoming more popular and efficient. Due to the limited space, we only discuss some works on the most representative CNN-based SISR reconstruction methods.
SRCNN is the first CNN-based SISR method, which achieved better results than conventional methods. It was proposed by Dong et al. [3,20] SRCNN is the first SISR method based on CNN, and discussed the relationship between SRCNN and the spare-coding method, which is one of the typical conventional SR methods. This method was then further improved mainly by increasing network depth or sharing network weights. SRCNN consists of three layers, which are inspired by sparse-coding: feature extraction, non-linear mapping, and reconstruction. Filters of spatial sizes 9 × 9, 1 × 1 and 5 × 5 are used respectively. For validation, LR images are created by the downsampling of HR images, then are transformed through a HIS transform. The Intensity channel matrix is upscaled through the network. Finally, an HIS reverse transform is implemented to generate the final image.
VDSR [21] is the first CNN-based SISR method using residual learning. VDSR always consists of several layers, which could deepen the network. VDSR achieved very good results and became one of the state-of-the-art SISR methods. Normally, Kim et al. [21] uses a 20 layers network to train and test data in that paper. As the basic theory in VDSR, residual learning [22] is very important and affects many methods in deepening the convolutional network. Conventional convolutional networks or fully connected networks will more or less lose information when transmitting information. At the same time, gradient disappearance or gradient vanishment will cause deep networks hard to train. Residual learning could solve this problem. By directly transferring input information to output, the integrity of information is protected. The whole network only needs to learn the difference between input and output, which simplifies the learning goal and difficulty. So, the depth of the residual network is much deeper than the depth of conventional convolutional network.
DRCN [32] combines residual learning and recursive network and it was also proposed by Kim et al. for solving the SISR problem in the same year. DRCN is another CNN-based method, which combines residual learning and recursive neural network. This network takes an interpolated image as input and is divided into three modules. The first module is Embedding network, which could extract feature maps. This module is a recursive network. The second module is the inference network, which can do non-linear mapping. The last module is the reconstruction network, which could reconstruct the final result.
DMCN is another CNN-based method, which proposed by Xu et al. [33]. This method is used to handle various remote-sensing image restoration tasks such as SR and Gaussian denoising. This method also build local and global memory connections to combine image detail with global information. This method could not only solve SR problems but also achieve a very good denoised result.

Proposed Method
In this section, the problem definition is discussed first, then the details of our proposed method, SO-EVDSR, are given.

Problem Definition
Because of the lack of data in the remote-sensing area, a single image for denoising and super-resolution reconstruction is one of the best ways to solve this problem. So, our method could be defined as this: recover a HR image from its LR version. The relationship between HR image and its LR version could be achieved by the final trained output of the CNN-based network, which takes a lot of HR ground truth and its LR downsampling as inputs.
In summary, the purpose of solving the denoising and SISR problem using a CNN-based method is to establish a mapping F from the LR image to its HR reconstruction image, which is as similar as possible to the ground truth HR image. Let us denote the LR image as Y. Our goal is to recover from Y an image F(Y). In our proposed method, SO-EVDSR is the mapping F that could recover from Y an image F(Y).
By the way, image denoising problem and SR problem are similar because these two problems all need to be processed high-frequency parts while most other information preserved. Although the noise in obtained SO images is caused by cosmic-ray, it could still be regarded as a kind of salt and pepper noise. So only one deep CNN-based network is used to solve these two problems. That means there is only one mapping F in our method.
The training and testing procedure of our SO-EVDSR are illustrated in Figure 2. Pairs of LR images and HR images were sent to SO-EVDSR for training, then other LR images are sent to SO-EVDSR for testing.

Proposed Network
In this subsection, the structure of our SO-EVDSR is proposed first. Then the details of global residual learning and local residual learning in our method are also discussed.

Enhanced Very Deep Super-Resolution Network in Space Object Researching
The proposed SO-EVDSR is illustrated in Figure 3. SO-EVDSR takes an interpolated LR image (to the desired size) with simulated cosmic-ray noise (salt and pepper noise) as input. The HR image with noise well removed is reconstructed by SO-EVDSR. SO-EVDSR is a very deep convolutional network that contains 20 convolutional layers. These convolutional layers were divided into three types. The first convolutional layer with relu belongs to the first type. The last convolutional layer without relu belongs to the second type, The rest belong to the third type, which includes nine identical convolutional blocks. Each convolutional block contains two convolutional layers with one relu. SO-EVDSR combines one global residual learning (GRL) and nine local residual learning (LRL). Except for the first and the last, the rest of layers are in the same type: 64 filters of the size 3 × 3 × 64, where a filter handle on 3 × 3 spatial region through 64 feature maps. The first layer handles the input image. The function of the last layer, which consists of a single filter of the size 3 × 3 × 64, is reconstructing image.
By the way, our output image has the same size as the input image by padding zeros every layer during training.
The configuration of SO-EVDSR is shown in Table 1.

Residual Learning in SO-EVDSR
Residual learning means the input and output are largely similar, so in our method, a residual r = y − x is defined, where most values are likely to be zero or small. Using residual learning to solve SR problems had been demonstrated in a very effective way [21]. In SO-EVDSR, r is the residual pixel of x and y. So, we could make the whole network training easy and could further make the whole network deeper than normal. It could roughly be divided into two types, i.e., global residual learning and local residual learning.
Global residual learning. The global residual learning (GRL) in SO-EVDSR is a long skip connection that could make the whole network deeper and easier to train. As illustrated in Figure 4, it usually connected the input and the output of the last convolutional layer by the Equation (1).
F HR , F LR and F n+1 denote the global residual learning of mapping F whose function is reconstructing the final result, the input image and the last convolutional layer of mapping F respectively. Local residual learning. The local residual learning (LRL) was used to alleviate the degradation problem caused by ever-increasing network depths and improve the learning ability. It could further improve the information flow [34]. As illustrated in Figure 5, the Equation (2) could describe how LRL in our SO-EVDSR works.
F n , F n,2 and F n−1 denote the nth local residual learning of mapping F, the second convolutional layer of the n-th convolutional block and the (n − 1)-th local residual learning of mapping F.

Experiment
In this section, the dataset we used in this article is discussed first. Then training parameters we used are discussed and given the final parameters we used. The experimental results are given in the last part of this section.

Dataset
The dataset this paper used was BUAA-SID1.0 [35], which contains 9200 gray images of the size 320 × 240. Before this dataset is established, there were no public documents about SO and image simulation mentioned SO image database, which contained abundant geometric characteristics. But the geometric characteristics of SO are very important for detecting and recognizing. Therefore, this database, which contains different viewing angles of 20 satellites, has great significance. The names of satellites in the database are a2100, astrolink, cobe, dsp, early-bird, eo1, ers, esat, ets8, fengyun, gallieo, glonas, goms, helios2, irns, is-601, minisat-1, radarsat-2, timed and worldview, respectively.
We separated the whole database into a training set and testing set with a hold-out method. We used 80% to train and 20% to test, because in SO research field, images obtained are comparatively similar and the requirements of reconstruction precision are higher than those of normal image (like cat or dog). First, we randomly chose 16 satellites (7360 images) as a training set and the other four satellites (1840 images) as a testing set. In order to shorten the experiment time, we randomly chose 20 images of different satellite attitudes from each satellite in the training set we mentioned above. Similarly, we also chose 20 images of different satellite attitudes from each satellite in the testing set. All images used in our experiment were down-sampled into 1/2, 3/1 and 1/4, respectively. Then salt and pepper noise was added to these down-sampling images. This could be used to simulate the actual situation. Finally, all images were separated into pieces of the size 41 × 41.

Training Parameters
We now describe some details of our training model. Let Y denote an interpolated LR image and X a HR image. A training dataset x (i) , y (i) N i=i was given, and our goal was to learn an end-to-end relationship model F that predicts valuesX = F (X), whereX is the prediction from LR image. We used mean squared error and L2 regularization to train the parameters. The definition of L2 loss function was illustrated in Equation (3): As illustrated in Equation (3), M represents the number of samples, F(Y) represents reconstructed image, and X represents the ground truth.
Depth. The depth of the network is a very important parameter. In VDSR [21], the author has verified the training results of the network from five layers to 20 layers. The results show that when the network layers are 20, the best result can be achieved by considering the training time and training effect comprehensively. We verify this conclusion, and the results are consistent with VDSR. As Figure 6 shows, we further deepened the network to 25 layers and found that the Peak Signal-to-Noise Ratio (PSNR) was not significantly enhanced. So we use 20 layers to train and test. Learning rate. One possible reason that could affect the convergence of the model is learning rate. A basic rule of learning rate is that a high learning rate could boost training. But simply setting the learning rate higher could also lead to vanishing or exploding gradients, which could make the whole network difficult to converge. Figure 7 illustrates the comparing result of different initial learning rates. This result shows that the learning rate was very important for training and only a proper learning rate could achieve the best effect. Gradient clipping. Gradient clipping is used in our training. In essence, the chain derivative was used in the back-propagation method while training. When calculating the gradient of each layer, some multiplication operations will be involved. Therefore, if the network was too deep, most of the multiplication factors were greater than unity, then the final result may tend to be infinite, which is called gradient exploding. Gradient clipping is a method that is often used to resolve gradient exploding problem. By setting a threshold θ, the gradient will be limited to that range [−θ, θ] when it exceeds to the threshold.
Mini-batch gradient descent. Mini-batch gradient descent is a compromise method between the batch gradient descent method and the stochastic gradient descent method. This method not only ensures the accuracy, but also speeds up the convergence. So, this method was also used in our training.
Epochs. Epoch is another important parameter in training. Figure 8 illustrate the result of increasing epoch. The result shows that peak signal-to-noise ratio did not increase obviously after 120 epochs.  Finally, we used a network of depth 20. Momentum and weight decay parameters are set to 0.9 and 0.001, respectively. Training batches were set to 64. We trained our model over 120 epoch. The learning rate was initially set to 0.001 and then decreased by a factor of 10 every 20 epochs. We used NVIDIA Tesla P4 to run our experiment and the total time taken by training was almost eight hours.

Results
To evaluate the result of these algorithms, we used quantitative measures to compare these results. Normally, we use five methods to evaluate the results.

Quantitative Measurements
PSNR is the most commonly used measurement to evaluate the result of an image. It is defined as the ratio between the maximum possible value of a signal and the value of distorting noise that affects the quality of its representation [36]. The definition of PSNR is showed in Equation (4): The MSE witch is defined in Equation (5) is the mean square error. X and F(Y) are the ground truth and reconstructed image, respectively. M and N are number of rows and columns in the images X and F(Y), respectively.
Another most commonly used measurement is the structure similarity (SSIM), which is defined in Equation (6). In Equation (6), a and b refer to images X and F(Y), which are referenced the ground truth and the reconstructed HR image respectively. C 1 and C 2 are constants. µ x is the local mean of reference (X) image and µ y is the local mean of the reconstructed F(Y) image. σ x and σ y are the standard deviations and σ xy is the cross-covariance for these images. Finally, a higher SSIM is better.
However, in this article, SSIM was probably not a good metric as the numbers were so close in different methods up to the fifth or sixth decimal places (as illustrated in Table 2). So the human visual system (HVS) [37] and human visual system model (HVSm) [37] metrics are also used to compare the reconstruction performance among different methods.
The HVS metric is defined as Equation (7).
In Equation (8) The only difference between HVS and HVSm is the visual masking effects are taken into account by HVSm. First, DCT coefficients of size 8 × 8 original image block and 8 × 8 processed image block are calculated between pixel values, then a reduction based on the value of contrast masking was done. Finally, we calculated MSE H . More details could be found in [40]. Table 2 shows that based on BUAA-SID1.0 dataset, our SO-EVDSR model has the best performance (red text in Table 2) in MSE, PSNR, SSIM, HVS and HVSm.

Visual Comparison Results
Firstly, we applied some classical methods to deal with cosmic-ray noise. Figure 9 shows the result of median filtering with filter sizes 3 × 3, 5 × 5, 7 × 7 and 9 × 9. The result shows that filter size 3 × 3 and 5 × 5 could not remove cosmic-ray noise completely, while 7 × 7 and 9 × 9 could remove cosmic-ray noise, but also cause further loss of image details. Therefore, using median filtering only could not achieve ideal denoising effect, and it is not suitable for space object image processing because of the high-precision measurement. Then, Figure 10 shows the result of using an erosion operation to deal with cosmic-ray noise. We used the "disk" and "square" kernel type respectively. The result shows that when the kernel size was 3, cosmic-ray noise could not be removed, and when the size was 4, cosmic-ray noise could be removed, but at the same time, the satellite details were obviously lost, so the simple erosion operation could not achieve the desired denoising effect. Then the expansion operation of the same size was added, the combination of these two operations was called the opening operation. The results show that the image details will still be lost further. Therefore, only using the erosion operation or opening operation could not achieve ideal denoising effect, and it was not suitable for space object image processing because of the high-precision measurement.  Figure 11 shows the result of using the BM3D algorithm [41] to deal with cosmic-ray noise. The result shows that the ideal denoising effort is still not achieved and the image is blurred further. That is because BM3D could not regard cosmic-ray noise as typical noise, which it could deal with. The similar result happens on the algorithm, which was proposed in [42]. Visually, in Figures 12-19, we could see the difference and enhancement among bicubic, SRCNN [3,20], VDSR [21], DRCN [32], SO-EVDSR, DMCN and the ground truth. We could also see the details that SO-EVDSR reconstructed, which SRCNN, VDSR, DRCN and DMCN could not reconstruct that clearly.
Compared to VDSR, which also could provide three scale factors in one network, our method not only performed better in scale factor 2, but also performed better in scale factor 3 and factor 4. Figure 20 shows the result of the comparison.
Finally, we also experimented using SO-EVDSR to compare with a cascade method, which applied a BM3D filter first to deal with cosmic-ray noise, then VDSR is applied to solve the SR problem. Figure 21 is the result of comparison and the result shows that the performance of the cascade method we applied was not as good as SO-EVDSR.

Discussion
Our SO-EVDSR, which combines GRL and LRL, achieves a very good result. Table 3 shows that based on the same training parameters, the combination of GRL and LRL has better performance in PSNR than using GRL or LRL only. This is different from Table 2. Table 2 focuses on comparing the effects of each single-frame super-resolution reconstruction method, while Table 3 is based on SO-EVDSR, comparing the effects of using GRL only (like VDSR), using LRL only and using GRL and LRL (like our method) at the same time. The result shows that our SO-EVDSR has the best performance in PSNR (red text in Table 3). In this work, we show that a single space object image denoising and super-resolution problem could be well solved by a deep CNN-based method. This is the first time using CNN-based method to solve single SO image denoising and SR problem. The experiment result shows that cosmic-ray noise are well removed and textures, especially linear textures in a high-resolution image are recovered. However, future research needs to especially evaluate the improvement for images of a star, which is another important type of SO. The texture and detail in star images are very different from these in satellite images.
It also of interest to research residual learning. Because we have already seen that the effect of combining global residual learning and local residual learning. So, future research will try more combinatorial approaches of the two.

Conclusions
In this work, we have presented a CNN-based denoising and single-image super-resolution method named SO-EVDSR. This is the first time using a very deep CNN-based denoising and SISR method to reconstruct high quality SO images. We combine global residual learning and local residual learning to enhance the reconstruction effect. We have demonstrated that based on BAUU-SID1.0 dataset, which is a collection of satellite images affiliated to 20 satellites, our proposed SO-EVDSR not only could remove the cosmic-ray noise, but also has better performance than VDSR and DRCN do, which are two of the state-of-the-art SISR methods, in both quantitative measurements and visual effect.