A new blind image denoising method based on asymmetric generative adversarial network

Image denoising is a classical topic in computer vision. In recent years, with the development of deep learning, image denoising methods based on discriminative learning have received more attention. In this paper, a new blind image denoising method based on the asymmetric generative adversarial network (ID-AGAN) is proposed. In the new method, the adversarial learning is used to optimise the high-dimensional image information denoising, so as to balance the noise removal and detail retention. In order to overcome the unstability of the GAN training and improve the discriminative ability of the discriminating model, an image downsampling layer is added between the generating model and the discriminating model. Moreover, a multi-scale feature downsampling layer is utilised to extract the feature of the entire image and reducing the effect of noise on training images. Extensive experiments are conducted to verify the performance of the ID-AGAN algorithm. The results demonstrate that authors’ method has high performance and ﬂexibility.


INTRODUCTION
Image denoising is a classical topic in computer vision and has been studied extensively. In real life, noise corruption is inevitable in the image sensing process and it may heavily degrade the quality of the acquired image. Therefore it is impossible to observe the image content clearly and the noisy image may effect on the high-level vision tasks. Accordingly, it is an essential step to remove the noise from the observed image in various image processing and computer vision tasks [1,2].
In general, image denoising is a fundamental problem in the field of computer vision and image processing and its goal is to estimate the original clean image from a noisy image. The noisy image y can be expressed as y = x + n, where x is the clean image corresponding to y and n is the noise. One common assumption is that n is an additive white Gaussian noise (AWGN). In the past decades, a large amount of denoising methods were proposed to recover the clean image from the noisy observation. And a gorgeous image denoising algorithm is supposed to have the following features: (1) It is flexible enough to recover the clean images with unknown noise. In other words, different noisy images can be recovered with a single model. (2) It is efficient and should not take too long to recover noisy images. (3) It is necessary to preserve the original image information while denoising, especially the high-dimensional image texture information that is easily mistaken as noise.
In the past few decades, many methods have been proposed for image denoising, including image prior based denoising methods [3][4][5][6][7][8][9] and discriminative learning based image denoising methods [10][11][12][13][14]. And image prior based denoising methods model image prior in various models, including nonlocal self-similarity (NSS) models [3,15,16], sparse models [4,6,17] and low rank models [7,8,[18][19][20]. Image prior based denoising methods rely on image prior information, they are flexible and can be used with unknown noise. One of the classic denoising methods is BM3D [3], which is a benchmark in image denoising. For BM3D, it usually searches the similar blocks in the image and groups them firstly. After that, the similar blocks are denoised using the cooperative filtering. Finally, these blocks are fused into the original image with different weights. BM3D is faster than most image prior based methods, but its performance will deteriorate when the noise is complex. Sparse models based method KSVD [4] is based on the sparse and redundant representations over trained dictionaries. It trains a dictionary which is used to describe the image. Generally, the dictionary is supposed to be redundant so that it can represent different image structures. However, sparse coding with an over-complete dictionary is unstable in image restoration. NCSR method learns dictionaries from clustered image patches. It leads to a more stable and sparser representation. Consequently it can achieve better image restoration results. But it also brings greater computational complexity. In many cases, low rank based method WNNM [7] can achieve the best performance compared with other methods based on image prior. In [7], the singular values in the weighted nuclear norm minimisation (WNNM) problem are assigned different weights. And the solutions of the WNNM problem are analysed under different weighting conditions to apply it to image denoising. However, most of the existing image prior based denoising method are developed for gray scale images including WNNM. And it is hard to extend them for colour image denoising because the noise statistics in R, G, and B channels can be various for real noisy images. In [8], a multi-channel optimisation model for real colour image denoising under the weighted nuclear norm minimisation (WNNM) framework is proposed. MC-WNNM can handle noisy image with different noise statistics in the three channels. Despite their high denoising performance, most of the image prior based methods typically have two major drawbacks. First of all, these methods include complex modeling processes, which make testing time-consuming. Secondly, the image priors adopted are defined mostly based on human knowledge on image. Nevertheless, human understanding on image is incomplete, which may lead to weakness in processing complex noisy image.
As an alternative, discriminative learning based image denoising methods aim to learn the potential structure of image and discover the difference between noisy image and clean image from training. With the development of deep learning, the capacity of neural networks has been greatly exploited. Accordingly discriminative learning based denoising methods show striking denoising performance. In particular, a trainable nonlinear reaction diffusion(TNRD) method was proposed which based on learning optimal nonlinear reaction diffusion models [10]. The TNRD approach is applicable for a variety of image restoration tasks by incorporating appropriate reaction force. And TNRD method is highly efficient in that it preserves the structural simplicity of diffusion models and take only a small number of diffusion steps. Subsequent work combines convolution neural network with image denoising such that the denoising performance has been greatly improved. Mao et al. [11] proposed a very deep residual encoder-decoder networks(RED-Net) for image restoration. The network consists of a chain of symmetric convolutional layers and deconvolutional layers. And some skip connections were added between corresponding convolutional and deconvolutional layers. DnCNN [12] further explores the potential of convolution neural networks. It utilises residual learning and batch normalisation to speed up the training process and boost the denoising performance. DnCNN can handle Gaussian denoising with unknown noise level as well as Gaussian noise at a certain noise level. DnCNN greatly improves the performance of denoising, but it still has the drawback that tends to produce over-smooth textures. In order to improve the performance of DnCNN, a fast and flexible denois-ing convolution neural network(FFDNet) [14] was proposed. The FFDNet method works on the downsampled sub-images, and a tunable noise level map is input as well as noisy image. It improves the performance of blind image denoising, but the noise level map of test image is unavailable generally. Most of the current discriminative learning based image denoising methods used deep convolution networks merely. In this way, the denoising model can be easily trained and a high PSNR can be obtained because the objective function is closely related to PSNR. However, it may lose image details that are more easily perceived by the human visual system.
The generative adversarial network(GAN) [21] that has achieved state-of-the-art results in image generation [22] can effectively solve the problem of over-smooth textures. Generative adversarial network consists of two parts: the generating model and the discriminating model. The generating model is trained to generate images that are difficult to distinguish from real images, and the discriminating model determines whether a sample is a real image or a generated image from the generating model. Two networks iteratively train by combating each other to generate realistic images. It can generate images that are closer to the real images, and can also generate images with more detailed information.
However, current image denoising methods based on the generative adversarial network didn't combine the advantages of GAN and convolution network which may deteriorate the denoising effect. A CNN architecture for blind image denoising was proposed [23] which combines three multi-scale feature extraction layer, an l p regulariser and a three step adversarial training. It is an attempt to apply GAN to image denoising, but the denoising performance was unsatisfactory. Then, a GAN-CNN based blind denoiser (GCBD) was proposed [24] which aims to tackle the problem of lack of paired training data. In this method, a GAN is trained to generate noise samples which are used to construct a paired training dataset. Then the paired data is used to train the convolution neural network for denoising. It's a novel way to solve the lack of paired training data, but it is weak in handle Gaussian denoising.
To overcome the drawbacks of existing generative adversarial network based denoising methods, a blind image denoising method based on the asymmetric generative adversarial network (ID-AGAN) is proposed in this paper. A convolutional autoencoder with symmetric skip connections is used in our denosing model. In order to extract image information of different sizes and dimensions, a multi-scale feature downsampling layer is added in the beginning of the denoising model. Moreover, it also can reduce the effect of noise and make the image blind denoising model more flexible and robust. To optimises the information transfer between the denoising model and the discriminating model, an image downsampling structure is added before the discriminating model. The downsampling structure makes the training times and feature sizes of the denosing model and the discriminating model asymmetrical. It can make full use of GAN's high-dimensional information recovery ability as well as making the whole model easier to train. The algorithm can balance the denoising performance and image detail retention by the improvement of the model structure.
The remainder of this paper is organised as follows. Section 2 provides image denoising methods based on convolution neural networks and generative adversarial networks. Then a description of our algorithm is presented in Section 3. Experimental results are provided in Section 4. The experimental results demonstrate the effectiveness of the ID-AGAN denoising method. Finally, conclusions are drawn in Section 5.

RELATED WORK
Before introducing the method proposed in this paper, the convolution neural networks for image denoising and GAN are briefly introduced in this section.

Convolution neural networks for image denoising
There are many methods [11,12,14,24] using convolution neural networks for image denoising. In [11], the encoder-decoder with skip connections is used to improve the denoising performance and this model lead to massive feature reuse. Denoising performance has been improved while the gradient explosion and gradient dispersion caused by the network depth are avoided. In [12], residual learning [25] is used to boost the denoising performance. Residual learning of convolution neural networks was originally proposed to alleviate gradient explosion and gradient dispersion. It has achieved notable improvement in object detection and image classification. However, the residual learning used in DnCNN [12] is different from [25]. In [12], the whole network is a residual block, that is, the output of the denoising network is the noise of the noisy image instead of denoising image.
There are two advantages for taking residual learning. First of all, the Gaussian noise distribution is easy to fit. In addition, the effect of image content on training results can be mitigated by training to generate noise. Another reason for convolution neural networks based methods improving performance is the mean squared error(MSE) loss function. PSNR is one of the most commonly used indicators to evaluate the performance of denoising. And there is a close relationship between MSE and PSNR. With the development of deep learning, loss can be reduced to a diminutive level by large-scale neural networks. Meanwhile, PSNR has been upgraded.

Generative adversarial network
Generally, GAN consists of a generative model and a discriminative model. The discriminative model is trained to determine whether a sample is from data distribution or the generative model. Simultaneously, the generative model is trained to generate samples close to the learned distribution to fool the discriminative model. In training, the two parts compete with each other to generate samples close to the data distribution. GAN show the potential to learn complex distributions in many applications [26][27][28]. However, it is generally known that GAN is unstable in training. To solve this problem, many optimisation methods and GAN variants have been proposed. Radford et al. [29] proposed a deep convolutional generative adversarial network(DCGAN) to improve the stability of GAN training by importing convolutional network. Mao et al. [30,31] proposed a least squares generative adversarial network(LSGAN) that adapts the least squares loss function for the discriminator. It overcomes the problem the vanishing gradient problem during the learning process. Arjovsky et al. [32,33] proposed a Wasserstein generative adversarial network(WGAN). Wasserstein distance is introduced in WGAN. WGAN not only solves the problem of unstable training, but also provides a reliable training process indicator. If GANs could be employed to image denoising, the problem of over-smooth in the denoising method based on discriminative learning can be improved.

THE PROPOSED ID-AGAN METHOD
In order to improve the performance of the existing denoising methods using the generative adversarial network, a new blind image denoising method based on the asymmetric generative adversarial network (ID-AGAN) is proposed in this section. The overall model of the proposed ID-AGAN algorithm is illustrated in Figure 1. The ID-AGAN consists of two parts: denoising model and discriminating model. The denoising model is a deep convolutional auto-encoder with skip connections and multi-scale downsampling input layers. And the discriminating model is a five-layer fully convolution network with an image downsampling layer. The specific network structure and training details were given in the following.

The denoising model
Our denoising model is modified by U-net [34]. On the basis of a deep convolutional auto-encoder with skip connections, a multi-scale downsampling input layer has been added in the front. More precisely, the noisy image is input into three parallel convolutional blocks (shown in Figure 1: Downsample 1-3) and sampled to different sizes firstly. Each downsample block is composed of three convolutional layers. And the first two convolutional layers have the same size of feature maps. The last convolutional layer in different downsample blocks reduce the size of the feature maps by two times, four times and eight times, respectively. Then these three convolutional blocks are connected to the auto-encoder. More content information from different dimensions of the image can be extract by the multiscale downsampling layer. And it can alleviate the image texture information losing obviously. In our model, the noise information is extracted from the noisy image using a combination of the multi-scale downsampling input layer and the encoder firstly. Then, the noise is reconstructed using combination of convolutional layers and transposed convolutional layers. Here, the corresponding sampling layers and the reconstruction layers are connected, that (b) discriminating model, which can distinguish between denoising image and clean image to guide the training of denoising model. The numbers on the square are the number and size of the filters is, the input of a certain reconstruction layer is the output of the previous reconstruction layer and the output of the corresponding sampling layer. There are two reasons for skip connections. First, gradients are often vanishing or exploding in deeper networks, so it is difficult for the network to converge. Adding skip connections allows to direct information exchange between the shallow and deep layers of the network, which can effectively solve this problem. In addition, not only the information extracted by convolution is needed, but a lot of information of original image is needed in reconstruction.

3.2
The discriminating model GAN [21] can achieve excellent results in image generation. However, GAN is not good enough in image denoising. Image denoising is to remove the noise of the image and preserve the original information of the image. The content difference between noisy image and original image is small. And the correlation of the image content is small. The one-to-one alternate training between the denoising model and the discriminating model makes it difficult to converge for the discriminating model, and does not improve the detail retention of the denoising model. Therefore, an image downsampling layer is introduced before the discriminating model to reshape the input image and increase the training times of discriminating model in each iteration. The denoising image is downsampled by h × w to 4 × 1 2 h × 1 2 w and then entered into the discriminating model during training. Hence the discriminating model is trained four times while the denoising model is trained once. Simultaneously in order to balance the influence of each part of the image on the training, the loss of each downsampled sub-image is calculated independently and the mean of the loss is fed back into the denoising model.
Our discriminating model is a full convolution network, which consists of five convolutional layers. The size of feature map is reduced by half from 64×64 layer by layer, and the number of feature map is doubled from 32 layer by layer. The output one-dimensional vector represents the probability that the input image is the original image rather than the denoising image.

Training
The input of our denoising network is y = x + n. Here x is clean images, y is noisy images and n is noises. And we can estimate f (y) ≈ n by residual learning. Then the denoising images x ′ = y − f (y) can be obtained. The loss function of the denoising model is shown in Eq.(1) and it consists of two parts: the mean square error (MSE) loss and the discriminating loss. In training, we set = 0.01.
where L C ( G ) is the MSE loss between the noise estimate and the added noise as the loss and it can be written as: Here G refers to the trainable parameter in the denoising model, {(y, x)} refers to a pair of training images during training.
In addition to the MSE loss, adversarial training is imported to guide the denoising model. Adversarial training [21] aims to train a generater G to generate samples from real data distribution. Some noises z are entered into the generater to learn the mapping to the data space. And it generates samples G (z ). On the other hand, discriminator D aims to distinguish between generated samples and original samples. It takes a data sample as input and outputs the probability of the sample coming from real data. Here, our denoising model is regarded as a generater. Instead of generating samples from noise z, our denoising model takes noisy images and generate denoising images. And our discriminating network aims to distinguish between clean images and denoising images. If the input sample is a clean image, it aims to output '1'. And denoising image corresponds to '0'. Our model is trained to find optimum parameters G and D satisfying * Here D refers to the trainable parameter in the discriminating model. For denoising model, the discriminating loss can be written as And for discriminating model, the discriminating loss is The denoising model and discriminating model are trained iteratively to optimise the parameters in both models simultaneously. The result is better than the deep convolution network optimised by the mean square error.

EXPERIMENTS
In this section, the performance of the proposed method are compared with several state-of-the-art denoising methods, including BM3D [3] based on filtering, EPLL [5] based on effective prior, NCSR [6] based on sparse coding, WNNM [7] and MC-WNNM [8] based on low rank and two discriminative learning based methods TNRD [10] and DnCNN [12]. Among them, TNRD and DnCNN are implemented by GPU. Moreover, all methods are tested and evaluated based on the parameter settings indicated in the author's paper. The code of our ID-AGAN model can be downloaded at https://github.com/wangemm/ID-AGAN.

Experimental setting
In the experiments, 400 training images in Berkeley Segmentation Data Set and Benchmarks 500 (BSDS500) are used as the training set. For Gaussian denoising with unknown noise level, we set the standard deviation range of the noise distribu- tion level as ∈ [0, 55]. The patch size is set as 256 × 256 and 55 × 400 patches are cropped to train the model. The ADAM algorithm is adopted to optimise ID-AGAN, and the mini-batch size is set as 8. We train 50 epochs for our ID-AGAN model. The learning rate starts with 1e-3 in the first 30 iterations and a smaller learning rate of 1e-4 is adopted for additional 20 epochs to fine-tune the network parameters. The denoising model and the discriminating model are trained at the same learning rate. In order to evaluate the performance of our ID-AGAN method on gray images, two different datasets are employed in the experiments. One of them is 68 test images in the Berkeley Segmentation Data Set (BSD68), and the other is a dataset that includes 12 popular test images (SET12). It is worth noting that all these images are widely used for the evaluation of Gaussian denoising methods and they are not included in the training dataset.
In addition to gray image denoising, we also train the blind colour image denoising model referred to as CID-AGAN. We use 400 training images from colour version of the BSDS500 dataset for training the CID-AGAN model. The noise levels are also set into the range of [0,55] and 55 × 400 patches of size 256 × 256 are cropped to train the model.
In the training of colour image denoising, We train 20 epochs for our ID-AGAN model. The first 10 iterations are trained with learning rate 1e-3, and the next 10 iterations are adjusted to 1e-4 to finetune the network parameters. And the denoising model and the discriminating model are trained at the same learning rate. In addition to the colour version of the BSD68 testset and the colour version of the SET12 testset, we also use PASCAL VOC 2010 Segmentation dataset to evaluate the denoising performance of the colour images. Moreover, to evaluate the robustness of the proposed ID-AGAN method, we also compare the performance of speckle denoising. For speckle denoising with unknown noise level, we use the same dataset, training epochs and learning rate as colour image Gaussian denoising. The standard deviation range of the noise distribution are also set into the range of [0,55]. And we use the colour version of the BSD68 testset to evaluate the denoising performance of speckle denoising.

Evaluation
The PSNR and SSIM results of different methods on the BSD68 dataset are given in Tables 1 and 2. The best results are marked in bold. It can be seen that, our ID-AGAN can achieve the best PSNR and SSIM performance at all noise levels compared to other methods. Compared with the benchmark method BM3D, methods such as WNNM can increase by up to 0.22 dB in PSNR and 0.0047 in SSIM at these three noise levels. And dis-criminative learning based method TNRD can achieve an average improvement of 0.36 dB in PSNR and 0.0118 in SSIM while DnCNN can increase by about 0.58 dB in PSNR and 0.0154 in SSIM. On the contrary, the proposed ID-AGAN method can increase the PSNR by 0.66 dB, and it is the only one of these methods that can exceed 29 dB in average. It is worth mentioning that, our ID-AGAN outperform BM3D by about 0.71 dB when = 50, which achieves the estimated PSNR bound over BM3D (0.7 dB) in [35]. Meanwhile, it can increase the SSIM by about 0.0172 compared with BM3D. The PSNR and SSIM results of different methods on SET12 are given in Tables 3 and 4. The best results are marked in bold. From Tables 3 and 4, it can be seen that ID-AGAN achieves the best results on most images. To be more specific, the average PSNR and SSIM obtained by ID-AGAN are better in all noise level compared to other methods. The best PSNR results are received in 6, 7 and 10 images, and the best SSIM results are received in 5, 7 and 7 images, respectively. Compared with the benchmark method BM3D, discriminative learning based method TNRD can achieve an average improvement of 0.1 dB in PSNR and 0.0057 in SSIM in average, and our ID-AGAN  Figure 2, the proposed ID-AGAN method produces better PSNR and SSIM compared to other methods. The removal of noise by BM3D and TNRD is not complete, and there are many noise points. DnCNN are cleaner in noise removal, they work well in smooth areas, but the details of the image are partially reduced. As can be seen from the enlarged part of the results, the proposed ID-AGAN method does better in terms of noise removal and detail retention. We also compare the effects on the colour image with the benchmark method CBM3D, the low rank based method MC-WNNM and the discriminative learning based method CDnCNN. The PSNR and SSIM results on the colour image dataset CBSD68 are given in Table 5. From Table 5, we can see that CID-AGAN is improved at any noise level compared to other methods. Compared to the benchmark method CBM3D, the PSNR can be increased by 0.69 dB while SSIM can be increased by 0.0184. And as the noise increases, the improvement of CID-AGAN compared with CBM3D becomes more notable. Moreover, compared to the deep learning based approach CDnCNN, the PSNR and SSIM on the CBSD68 test set can be greatly improved. In order to evaluate the effectiveness of the proposed CID-AGAN, the PASCAL VOC 2010 Segmentation dataset with 1928 colour images are used in the following. The PSNR and SSIM results on the PASCAL VOC 2010 Segmentation dataset are given in Table 6. Here, we also record the standard deviation in the table. It can be seen clearly that the CID-AGAN almost achieves the best performance than the competing methods. Although the CDnCNN can achieve smaller standard deviation compared to CID-AGAN,  produce over-smooth textures. Meanwhile, it can be seen from the background of the results by CBM3D and MC-WNNM that noise has not been removed completely. As you can see in Figure 5, the CBM3D retains image texture, but it does not completely remove the noise. While the CDnCNN removes noise thoroughly, it also removes many image textures as noise, such as the creases of butterfly's wings in the image. Figure 6 shows the average PSNR improvement over BM3D/CBM3D and DnCNN/CDnCNN in respect of different noise levels by ID-AGAN/CID-AGAN method. The results are evaluated on the gray/colour BSD68 dataset. It can be seen that our ID-AGAN/CID-AGAN models consistently outperform BM3D/CBM3D by a large margin on a wide range of noise levels. And the PSNR improvement increases with the noise level increasing. Meanwhile, our ID-AGAN/CID-AGAN models consistently surpass BM3D/CBM3D by about 0.1 dB in gray image and 0.3 dB in colour image.
In the following, the performance of CBM3D, MC-WNNM, CDnCNN and CID-AGAN are compared for the colour images with speckle noise. Here, the PSNR and SSIM for the CBSD68 datasets are given in Table 7. As one can see, CID-AGAN still outperforms CBM3D, MC-WNNM and CDnCNN for speckle noise. For all noise level, it surpasses benchmark method CBM3D by a large margin. And it outperforms CDnCNN by about 1.51 dB in PSNR and 0.0324 in SSIM. The visual effects of the methods CBM3D, CDnCNN, MC-WNNM and CID-AGAN for speckle denoising are shown in Figure 7. It can be seen that CID-AGAN can produce sharp edges and fine details whereas MC-WNNM and CBM3D tend to generate blurred edges. The CDnCNN tends to generate striation in the picture.
According to the above evaluation and observation, the proposed ID-AGAN method can obtain similar or better evaluation than the current denoising methods. Moreover, it can

Experiment on railway image denoising
In this section, we apply CBM3D, CDnCNN, MC-WNNM and CID-AGAN for railway images denoising. In the experiment, some railway images from PLACE365 dataset are used to evaluate the effectiveness of different methods. 16 images which shown in Figure 8 are selected as test images to evaluate the performance of CBM3D, CDnCNN, MC-WNNM and CID-AGAN. And all discriminative learning based methods denoise railway images without re-training. The PSNR and SSIM results are shown in Table 8. As can be seen from Table 8, our   The visual effects of the methods CBM3D, CDnCNN, MC-WNNM and CID-AGAN for railway images denoising are shown in Figure 9. As shown in Figure 9, CBM3D and CDnCNN tend to produce over-smooth textures. Meanwhile, it can be seen from the result obtained by MC-WNNM that noise has not been removed completely. In contrast, our method CID-AGAN can recover more details as well as removing noise more cleaner.

CONCLUSION
In this paper, we focus on improving the image blind denoising performance of deep learning methods and a new image denoising method based on asymmetric generative adversarial network (ID-AGAN) is proposed. The ID-AGAN consists of two parts: denoising model and discriminating model. Here, an asymmetric generative adversarial network has been used to optimise the high-dimensional image information denoising, hence it can balance the noise removal and detail retention. Moreover, the denoising model is a deep convolutional auto-encoder with skip connections and multi-scale downsampling input layer. The multi-scale feature downsampling layer used in the denoising model can extract image information of different dimensions   and reduce the effect of noise on training images. The superiority of the ID-AGAN algorithm over BM3D, EPLL, NCSR, WNNM, MC-WNNM, TNRD and DnCNN has demonstrated by the experiments. All the experiment results described in this paper have shown that our ID-AGAN method is effec-tive because it can better balance the noise removal effect and image details retention. Moreover, the ID-AGAN algorithm has been applied to the railway image denoising, it also illustrated its effective and superiority. In our future research, we will extend our model by importing attention mechanism as guidance to The visual effects of different methods on a railway image with = 50.The numbers below are PSNR(dB) and SSIM