Enhanced CNN for image denoising

Due to the flexible architectures of deep convolutional neural networks (CNNs), which are successfully used for image denoising. However, they suffer from the following drawbacks: (1) deep network architecture is very difficult to be train. (2) Very deep networks face the challenge of performance saturation. In this paper, we propose a novel method called enhanced convolutional neural denoising network (ECNDNet). Specifically, we use residual learning and batch normalization (BN) techniques to address the problem of training difficulties and accelerate the convergence of the network. In addition, dilated convolutions are used in our network to enlarge the context information and reduce the computational cost. Extensive experiments demonstrate that our ECNDNet outperforms the state-of-the-art methods such as IRCNN for image denoising.


Introduction
Image denoising is a classical technique of image restoration and has been successful in many fields such as pathological analysis and human entertainment [1,2]. The degradation model is widely used in denoising problem to recover clear image, which is expressed as y = x + m, where x is a clean image, y is a noisy image and m is the additive Gaussian noise with standard deviation s. According to the Bayesian theory, it is known that the prior is very important for image denoising [3]. For example, wavelet transformation with a prior of Markov random field is used to suppress noise [4]. Combing the self-similarities and sparse representation can improve the performance and reduce the storage for image denoising [5]. Block-matching and 3D filtering (BM3D) converts 2D image data into 3D data arrays and uses the sparse method to deal with the obtained 3D data arrays to remove noise [6]. Enforcing the gradient histogram of the noisy image is approximate to the theoretical gradient histogram of the clean image for image denoising [7]. In addition, Nonlocally centralised sparse representation (NCSR) [8], gradient methods [9,10], total-variation methods [11,12] and weight nuclear norm minimisation (WNNM) [13] are also very effective for image denoising.
Although the above methods have obtained great performance for denoising task, they still face the following problems [3]: (i) they need to set manually the parameters to obtain the optimal results. (ii) They use complex optimisation to improve the performance, which increases the computational cost.
Owing to the flexible connection fashion of the deep network architecture and strong learning ability, deep learning techniques have become the most effective methods to address the above problems for image denoising. Specifically, deep convolutional neural networks (CNNs) have attracted more attention in image denoising [14]. For example, CNN uses residual learning method to improve the performance in image denoising [3]. It first uses a model to deal with multiple restoration tasks such as image denoising, image super-resolution and image deblocking. The fusion of CNN and characteristics of denoisng task is useful to remove unknown [15]. Combining CNN and nature of images is very effective to obtain a clean image. For example, CNN utilises non-local similarity to deal with colour noisy images [16].
Discriminative learning methods embedded into optimisation method obtain great performance for real noisy images [17]. CNN consolidated unsupervised learning is a good choice for image restoration [18]. Using the principle of enhanced signal-to-design novel network architecture is also very popular to recover image [19]. Integrating spatial domain into CNN can better filter noise [20]. The combination of traditional denoising methods and CNN such as BMCNN is very competent to separate noise from noisy image [21]. The fusion of multiple features is very beneficial for image denoising [22]. Deep CNN has good visual effects on multiplicative noises [23]. Deep CNN is a good tool for medical image denoising [24,25]. The recently proposed deep cascade convolutional residual denoising network (DCCRDN) repeatedly uses concatenate operations to train the models for image denoising [26]. Although the above deep network methods have obtained great performance for denoising tasks, most of these methods suffer from the drawbacks of vanishing or exploding gradients when the network architecture is very deep. In addition, the above methods sacrifice the computational cost to improve the performance. For example, they apply multiple concanation operations to train the denoising model.
In this paper, we propose a novel network referred to as enhanced convolutional neural denoising network (ECNDNet). ECNDNet utilises residual learning technique [27] to prevent vanishing and exploding gradient problems. Moreover, batch normalisation (BN) [28] is used to accelerate the convergence of the trained model and make the network easy to train. To decrease the computational burden, we use dilated convolution [29] to capture more context information. Extensive experiments demonstrate that our proposed ECNDNet method outperforms the popular image denoising methods such as fast and flexible denoising net (FFDNet) [15], image restoration CNN (IRCNN) [17] and BM3D [6].
The main contributions of this work are summarised as follows: (i) The depth of the proposed ECNDNet is only set to 17 layers, which can effectively reduce the computational cost.
(ii) ECNDNet uses residual learning mechanism to prevent vanishing and exploding gradient problems. Besides, it utilises BN technique to normalise data and improve the efficiency of the training model.
(iii) ECNDNet uses dilated convolutions to enlarge the receptive field and improve the performance.
The remaining of this paper is organised as follows. Section 2 presents related work of the proposed method. Section 3 provides the proposed method. Section 4 shows the extensive experimental results of this paper. Section 5 offers the conclusion.

BN and residual learning
One of the reasons for CNN's success is its end-to-end connection. The end-to-end connection architecture of CNN generally includes initial parameter [30], gradient optimisation methods [31,32] and rectified linear unit (ReLU) [33]. Although the general network architecture has obtained good performance, they face vanishing/ exploding gradient problems and have difficulty in training deep networks. In this paper, we use BN and residual learning to address the above problems. The detailed information about BN [27] and residual learning [28] are explained as follows: the distribution of sample data is changed after it passes the convolution layer. This phenomenon is called internal covariate shift problem. This problem can be addressed by BN technique. That is, first, BN normalises the training data in every batch. Then, it uses scale and shift operations to recover the distribution of training data. The above two important parameters of BN are updated when the trained network is back propagation. BN is set before the activation function of each layer. BN enjoys the following merits: (i) it can accelerate the convergence of the training model and makes the network easier to train. (ii) It makes the different batches of training data keep uniform distribution and improves the performance of the network. (iii) It has low sensitivity for initialisation.
To the best of our knowledge, although increasing the depth of network can improve the performance for image denoising, deeper network may lead to the vanishing or exploding gradient problems. Residual learning is a good tool to solve this problem. It mainly adds the input (original images) and residual block (the output of several feature layers) as the input of the current layer to guarantee the performance. As shown in Fig. 1, we assume that x and f (x) represent the input and the output of stack several layers, respectively. The input of the next layer of the stack several layers is f (x) + x.

Dilated convolution
As we know, more features can improve the performance for image processing [34][35][36]. Enlarging the receptive field in the CNN is very effective for extracting more features for image denoising [37]. There are two popular ways to enlarge the receptive field: (i) enlarging the width of the network (also referred to as increasing the filter size).
(ii) Increasing the depth of the network. However, the first way may produce more parameters, which results in over fitting of the network. It also increases the computational cost. The second way may lead to vanishing/exploding gradients when the depth of the network is big. As a consequence, dilated convolution is a good choice to balance the above ways. Dilated convolution uses a dilated filter with dilation factor f to increase the obtained information. That is, a dilated filer can be expressed as a filter with size (2f + 1)(2f + 1). For example, when is 1, the receptive field of the first layer is 3. The receptive fields of the other layers are 5, 7, 9,…, respectively. In addition, combining the dilated filter and the convolutional kernel of 3 × 3 is very popular for image processing [29]. For more details on dilated convolution, please refer to [29].

Network architecture
According to the previous research, we know that the denoising method can be expressed as y = x + m. In this paper, the objective function of learning f (y) is as follows: Formula (1) is the objective function to train the denoising model, where p represents the parameters, y j represents the jth noisy image patch and x j represents the jth label image patch. Specifically, the image patches can reduce the computational cost  and learn more features [38]. Thus, we divided the image into patches is reasonable for image denoising. In addition, very deep architecture is another non-ignorable factor which can result in vanishing or exploding gradient problems. As a result of these concerns, we proposed a novel network called ECNDNet. ECNDNet consists of dilated convolution, residual learning, BN, convolution (Conv) and ReLU. We empirically find that sets the dilated convolution to the 2nd, 5th, 9th and 12th layers can not only increase the captured information, but also reduces the computational cost than that of each layer with dilated convolution. Moreover, the use of BN and residual learning makes this network more effective for image denoising. The architecture of the designed network is shown in Fig. 2 The merits of the proposed method have three-fold: (i) it uses 17 layers network and residual learning to prevent the problems of vanishing or exploding gradients. (ii) It uses BN technique to accelerate convergence and make the network easier to train. (iii) It uses dilated convolutions to enhance the performance of the designed network and reduce the computational cost.

Discussion
The proposed method relies on residual learning, BN and dilated convolution, they are complementary for image denoising. In this part, we will prove the effectiveness of these methods for     image-denoising. Here CRNet, CRRNet and CRRBNet have the same the number of network layers, convolutional kernel size and initial parameters. Specifically, CRNet consists of Conv and ReLU as shown in Fig. 3, where Conv and ReLU denote the convolution and rectified linear units, respectively. CRRNet consists of Conv, ReLU and residual learning technique as shown in Fig. 4. Here Figs. 1-4 are the schematic diagrams, in this paper CRRBNet consists of Conv, ReLU, residual learning and BN as shown in Fig. 5. First, we illustrate the peak signal-to-noise ratio (PSNR) of every training epoch for CRNet and CRRBNet. From Fig. 6,w e know that the combination of BN and residual learning is effective for image denoising. Then, we prove that the dilated convolution is useful for image denoising as shown in Fig. 7.

Experimental setting
We design a 17-layer network called ECNDNet. Its depth is the same as denoising CNN (DnCNN). Its loss function (also referred to as objective function) is shown as in (1). We choose Adam [39]t o optimise the converge model. The initial parameters are set as follows: (i) learning rate, beta_1, beta_2 and epsilon are 1×10 −3 , 0.9, 0.999 and 1×10 −8 , respectively. (ii) The initial weights are set as shown in [40]. (iii) The number of batches is 128. (iv) The number of epochs is 180 for the trained model. In addition, the learning rates of the 180 epochs are 1×10 −3 to 1×10 −8 .
We choose PyTorch tool [41] to train the denoising model in this paper. All the experiments are implemented in the environment of Ubuntu 16.04 and python 2.7 and run on PC with Intel Core i7 7800X CPU, RAM 16G and a Nvidia GeForce GTX 1080 Ti GPU. The types of Nivdia CUDA and cuDNN are 9.0 and 7.5, respectively.  best and second best performance are shown in italic and bold, respectively.

ECNDNet for grey image denoising
We use Fig. 8 to vividly show the performance of our method and other comparative methods with s = 50 on BSD68 dataset. To show the performance of our proposed method for the images of different categories, we validate it using the Set12 dataset.
From Table 2, it is known that our proposed method has good performance for each category image (Fig. 9). For example, the average PSNR of our method is 30.39 dB higher than that of BM3D when noise level is 25. Specifically, the best PSNR is marked in italic and the second PSNR is marked in bold as shown in Table 2. The detailed results of the comparative experiments are shown in [3,15,17]. Fig. 10 shows the denoising performance of different methods of an image.

Run time
PSNR and run time of processing an image are two important factors of image denoising. The performance of the proposed method has been proved in Section 4.2. The run time of processing an image is tested for gray image denoising as follows. We utilise noisy image sizes of 256 × 256, 512 × 512 and 1024 × 1024 with s = 50 to test the speed of different methods for an image. Specifically, we use PyTorch to test run time of DnCNN-s and ECNDNet. From Table 3, we know that our ECNDNet is competitive with popular methods such as BM3D, WNNM, EPLL, CSF, TNRD and DnCNN-s in run time. In summary, our proposed method is robust for image denoising.

Conclusion
In this paper, a deep CNN called ECNDNet is proposed to solve the image denoising problem.
Specifically, BN, residual learning and dilated convolution are used to enhance network performance. BN can deal with internal covariate shift problem and makes the network easier to train. Residual learning technique can address the problem of vanishing or exploding gradients. It is used to obtain clean images from noisy images and residual images. Dilated convolution can extract more context information and reduce the computational cost. In addition, BN, residual learning and dilated convolution are complement for image denoising. Extensive experiments show that ECNDNet is more effective than the popular denoisng methods such as IRCNN. In the future, we will combine model baseoptimisation and discriminative learning methods to remove the noise from real noisy images.

Acknowledgments
This paper was supported in part by the Guangdong Province high-level personnel of special support program under grant no. 2016TX03X164, in part by the Shenzhen Municipal Science and Technology Innovation Council under grant no. JCYJ20170811155725434.