Next Article in Journal
Variational Inference via Rényi Bound Optimization and Multiple-Source Adaptation
Next Article in Special Issue
Multiview Data Clustering with Similarity Graph Learning Guided Unsupervised Feature Selection
Previous Article in Journal
Variable-Length Resolvability for General Sources and Channels
Previous Article in Special Issue
Graph Clustering with High-Order Contrastive Learning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Denoising Vanilla Autoencoder for RGB and GS Images with Gaussian Noise

by
Armando Adrián Miranda-González
1,
Alberto Jorge Rosales-Silva
1,*,
Dante Mújica-Vargas
2,
Ponciano Jorge Escamilla-Ambrosio
3,
Francisco Javier Gallegos-Funes
1,
Jean Marie Vianney-Kinani
2,4,
Erick Velázquez-Lozada
1,
Luis Manuel Pérez-Hernández
1 and
Lucero Verónica Lozano-Vázquez
1
1
Escuela Superior de Ingeniería Mecánica y Eléctrica Unidad Zacatenco Sección de Estudios de Posgrado e Investigación, Instituto Politécnico Nacional, Mexico City 07738, Mexico
2
Departamento de Ciencias Computacionales, Tecnológico Nacional de México, Cuernavaca 62490, Mexico
3
Centro de Investigación en Computación, Instituto Politécnico Nacional, Mexico City 07738, Mexico
4
Unidad Profesional Interdisciplinaria de Ingeniería Campus Hidalgo, Instituto Politécnico Nacional, Pachuca de Soto 42162, Mexico
*
Author to whom correspondence should be addressed.
Entropy 2023, 25(10), 1467; https://doi.org/10.3390/e25101467
Submission received: 16 May 2023 / Revised: 6 July 2023 / Accepted: 31 July 2023 / Published: 20 October 2023
(This article belongs to the Special Issue Pattern Recognition and Data Clustering in Information Theory)

Abstract

:
Noise suppression algorithms have been used in various tasks such as computer vision, industrial inspection, and video surveillance, among others. The robust image processing systems need to be fed with images closer to a real scene; however, sometimes, due to external factors, the data that represent the image captured are altered, which is translated into a loss of information. In this way, there are required procedures to recover data information closest to the real scene. This research project proposes a Denoising Vanilla Autoencoding (DVA) architecture by means of unsupervised neural networks for Gaussian denoising in color and grayscale images. The methodology improves other state-of-the-art architectures by means of objective numerical results. Additionally, a validation set and a high-resolution noisy image set are used, which reveal that our proposal outperforms other types of neural networks responsible for suppressing noise in images.

1. Introduction

Currently, there is a growing interest in the use of artificial vision systems for application in daily tasks such as industrial processes, autonomous driving, telecommunication systems, surveillance systems, and medicine, among others [1]. Recent developments in the field of artificial vision have stimulated the need to make increasingly robust systems to meet established quality requirements, which is an essential part of why systems fail to cover these types of requirements, mainly in data acquisition. Among image acquisition systems, there are several factors that can alter the result of the capture, including failures in the camera sensors, adverse lighting conditions, electromagnetic interferences, noise generated by the hardware, etc. [2]. All of these phenomena are described using distribution models and are known, in a general way, as noise. The procedure in the image processing field to try to diminish the effect of the noise is known as the pre-processing stage in any image processing system. In recent years, various algorithms have been developed in denoising images, and recently, a new field has taken much interest in the scientific community. In this way, deep learning methods emerge [3,4].
Deep learning methods particularly present an inherent ability to overcome the deficiencies contained in some traditional algorithms [5]; however, despite their significant improvements compared to traditional filters, deep learning methods have practical limitations to their credit, which fall in high computational complexity. Although, as previously mentioned, various methods have focused on noise suppression, in this work, autoencoders are proposed, which are neural networks capable of replicating an unknown image by applying convolutions whose weights were adjusted with previous training [6,7,8]. This research project highlights the importance of using autoencoders because they do not require high computational complexity, demonstrating noticeable improvement compared to other types of deep learning architectures, such as the Denoising Convolutional Neural Network (DnCNN) [9], the Nonlinear Activation Free Network for Image Restoration (NAFNET) [10], and the Efficient Transformer for High-Resolution Image Restoration (Restormer) [11].
The rest of this paper is structured as follows. In Section 2, the theoretical background work is described. The proposed model is described in Section 3. The experimental setup and results are discussed in Section 4. Finally, the conclusions of this research work are given in Section 5.

2. Background Work

In recent years, noise suppression has become a dynamic field within the domain of image processing. This is due to the fact that as technological advances emerge, a greater understanding of the scene in which a vision system is interacting is required [12]. For the suppression of noise, several processing techniques have been proposed. These techniques are known as filters that depend on the noise present in the image and are mainly classified into two types.

2.1. Spatial Domain Filtering

Spatial filtering is a traditional method for noise suppression in images. These filters suppress noise by being applied directly to the corrupted image. They can generally be classified into linear and non-linear. Among the most common filters are:
  • Mean Filter: For each pixel, there are samples with a similar neighborhood to the pixel’s neighborhood, and the pixel value is updated according to the weighted average of the samples [13].
  • Median Filter: The use of this filter is that the central pixel of a neighborhood is replaced by the median value of the corresponding window [14].
  • Fuzzy Methods: This type of filter is different from those mentioned above since it is mainly constituted by fuzzy rules with which it is possible to preserve the edges and fine details in an image. Fuzzy rules are used to derive suitable weights for neighboring samples by considering local gradients and angle deviations. Finally, directional processing is used with which it improves the precision of the same filter [15].

2.2. Transform Domain Filtering

Transform domain filtering is a very useful tool for signal and image processing due to its extensive analysis of multiple resolutions, sub-bands, and location in the time and frequency domains. An example of this type of filtering is the Wavelet method, which is performed based on the frequency domain and attempts to distinguish the signal from noise and preserve said signal in the noise suppression process. As a first step, a wave base is selected to determine the decomposition of its layers to later select the level of decomposition, establishing a threshold in all the sub-bands for all levels [16].

2.3. Artificial Intelligence

A new method of processing images has emerged, called artificial intelligence. To address the issue of noise suppression, it is necessary to distinguish between artificial intelligence, machine learning, and deep learning, because people tend to use these terms synonymously, but there exists a subtle difference. Artificial intelligence involves machines that can perform tasks with characteristics of human intelligence, such as understanding language, recognizing objects, gestures, sounds, and problem solving [17,18]. Machine learning is a subset that belongs to artificial intelligence. The function is to obtain better performance in the learning task. The algorithms used are mainly statistical and probabilistic ones, making the machines improve with experience, allowing them to act and make decisions based on the input data [19]. Finally, deep learning is a subset of machine learning that uses techniques and algorithms of automatic learning that have high performance in different problems of image recognition, sound recognition, etc., since the basic functioning and structure of the brain and the visual system of animals are imitated [20].
There are two types of deep learning: the first type is supervised, learning which takes a direct approach using labels on learning data to build a reasonable understanding of how machines make decisions, and the second is unsupervised learning, which takes a very different approach by learning by itself how to make decisions or perform specific tasks without the need to contain labels in a database [21].

Autoencoders

Autoencoders are unsupervised neural networks, and the main function of autoencoders is that the input and the output are the same [22]. This is taken as an advantage against other models because, in each training phase of the neural network, the output is compared with the original image version, and through a calculation error, the weights found in each of the layers that make up the autoencoder are adjusted. This adjustment is carried out by means of the backpropagation method. There are different types of autoencoders, which are:
  • The Vanilla Autoencoder (VA) comprises only three layers: the encoding layer, in charge of reducing the dimensions of the input information; the hidden layer, better known as latent space, in which are the representations of all characteristics learned by the network; and the decoding layer, which is in charge of restoring the information to its original input dimensions, as shown in Figure 1 [23].
  • The Convolutional Autoencoder (Conv AE) makes use of convolution operators and extracts useful representations from the input data, as shown in Figure 2. The input image is sampled to obtain a latent representation and is forced to learn that representation [24].
  • The Denoising Autoencoder (DA) is a robust modification of Conv AE that changes the input data preparation. The information the autoencoder is trained in is divided into two groups: original and corrupted. In order for the autoencoder to learn to denoise an image, the corrupted information is sent to the input of the network to be processed. Once the information is in the output, it is compared with the original [25]. This type of autoencoder is capable of generating clean images from noisy images, ignoring the type of noise present as well as the density in which the image was affected.

3. Proposed Model

The proposed model is based on the suppression of Gaussian noise in both RGB and grayscale (GS) images. Figure 3 shows the architecture of the proposed Denoising Vanilla Autoencoder (DVA) algorithm, which consists of a selection stage where, if the image to which the processing is going to be submitted is of the RGB type, a multimodal model is applied, and if it is a GS image, a unimodal model is applied. This is described by Equation (1).
The advantage of combining two types of autoencoder architectures (VA and DA) is that by only having one encoding layer and one decoding layer, the reconstructed pixels do not have many alterations, which could translate into a loss of information, and at the same time, they are capable of remove noise present in images. The use of the autoencoder also allows us to have a lower computational load, which, in turn, improves both training and processing times once the network models are generated.
X = u n i m o d a l c = 1 i f X GS m u l t i m o d a l c = 3 i f X RGB ,
where X is the image processed by DVA, and c is the number of channels in the corrupted image.
X R w , h , c , W R m , n , c , k ,
where X is the corrupted image with dimensions width w, height h, and channels c, and W is the matrix weight with dimensions width m, height n, channels c, and k kernels.
( X W ) ( i , j , c ) = m n k ( x ( i + m 2 , j + n 2 , c ) · w ( m , n , c , k ) ) + b c ,
where ( X W ) ( i , j , c ) is the intensity of the result of the k convolutions in the position ( i , j , c ) , b is the bias.
Y ( i , j , c ) = f ( X W ) ( i , j , c )
where Y ( i , j , c ) is the result of the activation function ReLu f in the position ( i , j , c ) .
f = 0 f o r Y ( i , j , c ) < 0 Y ( i , j , c ) f o r Y ( i , j , c ) 0 ,
Z ( i , j , c ) = m a x Y ( i + p , j + q , c ) , Y ( i + 1 + p , j + q , c ) , Y ( i + p , j + 1 + q , c ) , Y ( i + 1 + p , j + 1 + q , c )
where Z is the encoded image by maxpooling, p = 0 , 1 , 2 , , w 2 1 , and q = { 0 , 1 , 2 , , h 2 1 } are the strides.
Z ( i , j , c ) = f ( Z W ) ( i , j , c )
where Z ( i , j , c ) is the result of the second convolutional layer and activation function, and W is another matrix weight.
Y ( i + p , j + q , c ) , Y ( i + 1 + p , j + q , c ) , Y ( i + p , j + 1 + q , c ) , Y ( i + 1 + p , j + 1 + q , c ) = Z ( i , j , c )
where Y is the dencoded image by upsampling.
X ( i , j , c ) = ( Y W ) ( i , j , c )
where X is the final result of the processing, and W represents another matrix weight.
For the multimodal model, the image is separated into its three different components (red, green, blue), and each component is processed independently, with models trained for each type of channel (Equations (2)–(9)) so that once the result is obtained, the three new ones are concatenated. The components generate a new image in which the noise is smoothed out. Within the unimodal model, a single trained model is applied. The main reason why a multimodal model was trained for RGB-type images is because the noise, being completely random and defined by a Gaussian probability, means that each channel is affected differently. In this case, processing the three channels of the image in the same way can cause the final smoothing to not be carried out properly and contain a greater number of corrupted pixels. Figure 4a shows the original histogram of the Lenna image, and Figure 4b shows how the image behaves when corrupted with Gaussian noise with density σ = 0.20 . This example is perceived as the red channel tends to increase the intensity of its pixels, and in the case of both the green channel and the blue channel, their intensities tend to decrease.
The DVA process is described in detail in Algorithm 1. Once the processing through the DVA is finished, we analyze the histogram of the resulting image, which is shown in Figure 5, perceiving how the DVA restores the intensities of the pixels contained in each of the channels to a certain extent. In this sense, the DVA is capable of restoring the image; however, it is not an optimal processing due to the nature of the noise since the same noise causes significant loss of information in the images, which the DVA tries to bring closer to the images. The intensities of the corrupted pixels are an ideal panorama.
Algorithm 1: Process image using DVA.
Entropy 25 01467 i099

Network Training

For the multimodal model, the “1 million faces” database was used, of which only 7000 images were used [26], which were resized in a dimension of 420 × 420 pixels. The same database was duplicated to generate the noise database. The 7000 images were divided into batches of 700 in which each batch was corrupted with a different noise density. The noise densities used are 0 , 0.1 , 0.15 , 0.2 , 0.25 , 0.3 , 0.35 , 0.4 , 0.45 , 0.5 . Once the two databases were obtained, the DVA training was carried out. The databases were divided into 80% for the training phase and 20% for the validation phase. In the case of the unimodal model, the original database was converted to GS, and the database with noise was created by repeating the above procedure.
The network was trained on an NVIDIA GeForce RTX 3070 (8GB) GPU. The hyperparameters used were seed = 17, learning rate = 0.001, shuffle = true, optimizer = Adam, loss function = MSE, epochs = 100, batch size = 50, and validation split = 0.1. Figure 6 shows the learning curves for the training and validation phase throughout the 100 epochs, showing us that the proposed architecture did not suffer from overtraining for both the unimodal model (Figure 6a) and the multimodal model (Figure 6b).

4. Experimental Results

The evaluation of the DA was carried out through the use of various images both in RGB and in GS of different dimensions. These images are unknown to the network in order to verify the proper functioning of the same. The evaluation images are shown in Figure 7. Each evaluation image was corrupted with Gaussian noise with densities from 0 to 0.50 in intervals of 0.01.
To gain a better perspective of the proper functioning of the proposed algorithm, comparisons were made with three other neural networks that differ in their structure but whose objective is noise smoothing. Table 1 shows the visual comparisons of the results obtained by the DVA and the other neural networks used to validate the algorithm for the Lenna image in GS. Table 2 shows the same comparisons for the Lenna image but this time in RGB. It should be noted that an approach was made to a region of interest to have a better perspective of the work of each of the networks on the image in question. In addition to the visual comparisons, evaluation metrics were used, such as:
  • Mean Square Error (MSE): Calculate the mean of the differences between the original images and the processed images squared.
    M S E = 1 M N i = 1 M j = 1 N ( x ( i , j ) y ( i , j ) ) 2 ,
    where x and y are the images to compare, ( i , j ) is the coordinates of the pixel, and M and N are the size of the images.
  • Root Mean Squared Error (RMSE): Commonly used to compare the difference between the original images and the processed images by directly computing the variation in pixel values [27].
    R M S E = 1 M N i = 1 M j = 1 N ( x ( i , j ) y ( i , j ) ) 2 ,
  • Erreur Relative Globale Adimensionnelle de Synthèse (ERGAS): Used to compute the quality of the processed images in terms of normalized average error of each band of processed image [28].
    E R G A S = 100 d h d l 1 n i = 1 n R M S E 2 μ i 2 ,
    where d h d l is the ratio of pixel between hue and light, n is the number of bands, and μ i is the mean of the ith band.
  • Peak Signal-to-Noise Ratio (PSNR): A widely used metric that is computed by the number of gray levels in the image divided by the corresponding pixels in the original images and the processed images [29].
    P S N R = 10 l o g 10 ( 2 b 1 ) 2 M S E ,
    where b is the number of the bits in the image.
  • Relative Average Spectral Error (RASE): Characterizes the average performance of a method in the considered spectral bands [30].
    R A S E = 100 μ 1 n i = 1 n ( R M S E 2 ) ( B i ) ,
    where μ is the mean radiance of the n spectral bands, and B i represents ith band of the image.
  • Spectral Angle Mapper (SAM): Computes the spectral angle between the pixel, the vector of the original images, and the processed images [31].
    S A M = c o s 1 i = 1 n x ( i , j ) y ( i , j ) i = 1 n x ( i , j ) 2 i = 1 n y ( i , j ) 2 ,
  • Structural Similarity Index (SSIM): Used to compare the local patterns of pixel intensities between the original images and the processed images [32].
    S S I M = ( 2 μ x μ y + C 1 ) ( 2 σ x y + C 2 ) ( μ x 2 + μ y 2 + C 1 ) ( σ x 2 + σ y 2 + C 2 ) ,
    where μ x and μ y are the mean of the images, respectively; σ x y is the covariance between the images to compare; C 1 = ( k 1 L ) 2 and C 2 = ( k 2 L ) 2 are two variables to stabilize the division with low denominators; L is the dynamic range of the pixel values; K 1 < < 1 ; and K 2 < < 1 .
  • Universal Quality Image Index (UQI): Used to calculate the amount of transformation of relevant data from the original images into the processed images [33].
    U Q I = 4 σ x y μ x μ y ( σ x 2 + σ y 2 ) ( μ x 2 + μ y 2 ) ,
Table 3 exemplifies the PSNR results obtained by each neural network used in the validation GS images, and Table 4 exemplifies the PSNR results obtained in the same way but for RGB images.
In order to better show all the results of the metrics calculated from the validation database images processed by each of the aforementioned networks, Box-and-Whisker plots were made. This type of graph shows a summary of a large amount of data in five descriptive measures, in addition to intuiting its morphology and symmetry. This type of graph allows us to identify outliers and compare distributions.
Figure 8 shows the Box-and-Whisker plots for each of the metrics applied to the results of the GS images, and Figure 9 also shows the plots for the RGB image results. In each of the diagrams, it can be seen that the DVA contains smaller box dimensions with respect to the other networks, which means that the results obtained oscillate in a smaller range, so the result of the processing is similar regardless of the density with which the image is corrupted. The median is also located near the center of the box, which indicates that the distribution is almost symmetrical. Another point to highlight in the diagrams is that there are fewer outliers in the DVA compared to the other networks.
Recapitulating the previous results, it has been determined that the DVA obtained better results in comparison with the other neural networks. Although the difference presented in the metric calculations is not visually appreciated, this is mainly due to the fact that these metrics do not accurately reflect the perceptual quality of the human eye. One measure of image quality is the Mean Opinion Score (MOS) [34]; however, this type of measure is not objective as it differs depending on the user in question [35].
Another point in favor of the DVA is that it can be used in images of any dimension. As an example, Table 5 shows the visual and calculated results for high-definition images in which it is perceived that good restoration results are obtained.
As an aggregate, the negative of the differences between the analyzed image and the original image is shown, in which all the white pixels represent the pixels that are equal to those of the original image, for which it can be deduced that the DVA manages to have a good restoration of the image when it is corrupted with Gaussian noise.

5. Conclusions

In this research work, the importance of the use of filters for artificial vision systems was highlighted, as well as the basic concepts that encompass artificial intelligence and some types of unsupervised networks that are used today. Through this, a methodology based on autoencoders was proposed, which is capable of processing images of any size and type (RGB or GS). When carrying out the analysis of the results shown, it is identified that, from the use of the DVA, it is possible to efficiently smooth the Gaussian noise of images through the deep learning techniques implemented in the proposed algorithm regardless of the density of noise present in the corrupted images. The DVA results, both visual and calculated using various quantitative metrics, show better results in noise suppression compared to the DnCNN, NAFNET, and Restormer algorithms that, despite being of different architecture, have the function of smoothing noise in images.
One of the limitations observed during this research work is that when the image presents a low noise density, the results are similar to the architectures with which the DVA was compared. That is why it is suggested as a starting point to make improvements either by transferring learning or combining this methodology with another such as that proposed in [36] in order to obtain both qualitative and quantitative results, since it is extremely important for vision systems to get as close as possible to the real scene in order to reduce errors.

Author Contributions

Conceptualization, A.A.M.-G., A.J.R.-S. and D.M.-V.; methodology, A.A.M.-G., A.J.R.-S. and D.M.-V.; software, A.A.M.-G., A.J.R.-S., D.M.-V. and J.M.V.-K.; validation, P.J.E.-A., F.J.G.-F., E.V.-L. and L.M.P.-H.; formal analysis, A.A.M.-G., A.J.R.-S., D.M.-V., P.J.E.-A., F.J.G.-F., E.V.-L., L.M.P.-H. and L.V.L.-V.; investigation, A.A.M.-G., A.J.R.-S., D.M.-V., P.J.E.-A. and J.M.V.-K.; writing—original draft preparation, A.A.M.-G.; writing—review and editing, A.A.M.-G., A.J.R.-S., D.M.-V., P.J.E.-A., F.J.G.-F., J.M.V.-K., E.V.-L., L.M.P.-H. and L.V.L.-V.; supervision, A.A.M.-G., A.J.R.-S., D.M.-V. and P.J.E.-A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors thank to Instituto Politécnico Nacional and Consejo Nacional de Humanidades Ciencias y Tecnologías for their support in carrying out this research work.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Limshuebchuey, A.; Duangsoithong, R.; Saejia, M. Comparison of Image Denoising using Traditional Filter and Deep Learning Methods. In Proceedings of the 2020 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), Online, 24–27 June 2020; pp. 193–196. [Google Scholar]
  2. Ajay, K.B.; Brijendra, K.J. A Review Paper: Noise Models in Digital Image Processing. Comput. Res. Repos. 2015, 6, 63–75. [Google Scholar]
  3. Verma, R.; Ali, J. A comparative study of various types of image noise and efficient noise removal techniques. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 2013, 3, 617–622. [Google Scholar]
  4. Tian, C.; Fei, L.; Zheng, W.; Xu, Y.; Zuo, W.; Lin, C.W. Deep Learning on Image Denoising: An Overview; Elsevier: Amsterdam, The Netherlands, 2020; pp. 1–39. [Google Scholar]
  5. Liu, W.; Wang, Z.; Liu, X.; Zeng, N.; Liu, Y.; Alsaadi, F.E. A survey of deep neural network architectures and their applications. Neurocomputing 2017, 234, 11–26. [Google Scholar] [CrossRef]
  6. Agarwal, S.; Agarwal, A.; Deshmukh, M. Denoising Images with Varying Noises Using Autoencoders. In Proceedings of the Computer Vision and Image Processing: 4th International Conference, CVIP 2019, Jaipur, India, 27–29 September 2019; Volume 1148, pp. 3–14. [Google Scholar]
  7. Dong, L.F.; Gan, Y.Z.; Mao, X.L. Learning Deep Representations Using Convolutional Auto-Encoders with Symmetric Skip Connections. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 3006–3010. [Google Scholar]
  8. Holden, D.; Saito, J.; Komura, T. Learning Motion Manifolds with Convolutional Autoencoders. Assoc. Comput. Mach. 2015, 18, 1–4. [Google Scholar] [CrossRef]
  9. Zhang, K.; Zuo, W.; Chen, Y. Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising. IEEE Trans. Image Process. 2017, 26, 3142–3155. [Google Scholar] [CrossRef]
  10. Chen, L.; Chu, X.; Zhang, X.; Sun, J. Simple Baselines for Image Restoration. arXiv 2022, arXiv:2204.04676v4. [Google Scholar]
  11. Zamir, S.W.; Arora, A.; Khan, S. Restormer: Efficient Transformer for High-Resolution Image Restoration. arXiv 2021, arXiv:2111.09881. [Google Scholar]
  12. Xiaojun, C.; Ren, P.; Xu, P. A Comprehensive Survey of Scene Graphs: Generation and Application. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 22359232. [Google Scholar]
  13. Steffen, S.; Adrian, S.; Kendra, B. Image Processing of Multi-Phase Images Obtained via X-ray Microtomography: A Review; American Geophysical Union: Washington, DC, USA, 2014. [Google Scholar]
  14. Balafar, M.; Ramli, M. Review of brain mri image segmentation methods. Artif. Intell. 2010, 33, 261–274. [Google Scholar] [CrossRef]
  15. Mario, V.; Francesco, M.; Giovanni, A. Adaptive Image Contrast Enhancement by Computing Distances into a 4-Dimensional Fuzzy Unit Hypercube; IEEE: Washington, DC, USA, 2017; pp. 26922–26931. [Google Scholar]
  16. Diwakar, M.; Kumar, M. A review on ct image noise and its denoising. Biomed. Process. Control 2017, 42, 73–88. [Google Scholar] [CrossRef]
  17. Chollet, F. Deep Learning with Python; Simon & Schuster: New York, NY, USA, 2018; p. 384. [Google Scholar]
  18. Zhang, L.; Chang, X.; Liu, J.; Luo, M.; Li, Z.; Yao, L.; Hauptmann, A. TN-ZSTAD: Transferable Network for Zero-Shot Temporal Activity Detection; IEEE: Washington, DC, USA, 2023; pp. 3848–3861. [Google Scholar]
  19. Aurelien, G. Hands-On Machine Learning with Scikit-Learn and TensorFlow Concepts; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2017; p. 856. [Google Scholar]
  20. Gulli, A. Deep Learning with Keras; Packt Publishing Ltd.: Birmingham, UK, 2017; p. 318. [Google Scholar]
  21. Karatsiolis, S.; Schizas, C. Conditional Generative Denoising Autoencoder; IEEE: Washington, DC, USA, 2020; pp. 4117–4129. [Google Scholar]
  22. Majumdar, A. Blind Denoising Autoencoder. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 312–317. [Google Scholar] [CrossRef] [PubMed]
  23. Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef] [PubMed]
  24. Leonard, M. Deep Learning Nanodegree Foundation Course; LectureNotes in Autoencoders; Udacity: Emeryville, CA, USA, 2018. [Google Scholar]
  25. Vincent, P.; Larochelle, H.; Bengio, Y. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, 5–9 July 2008; pp. 1096–1103. [Google Scholar]
  26. Bojan, T. “1 Million Faces”. Kaggle. Available online: https://www.kaggle.com/competitions/deepfake-detection-challenge/discussion/121173 (accessed on 20 February 2023).
  27. Zoran, L.F. Quality Evaluation of Multiresolution Remote Sensing Image Fusion. UPB Sci. Bull. 2009, 71, 38–52. [Google Scholar]
  28. Du, Q.; Younan, N.H.; King, R. On the performance evaluation of pan-sharpening techniques. IEEE Remote Sens. 2007, 4, 518–522. [Google Scholar] [CrossRef]
  29. Naidu, V.P.S. Discrete Cosine Transform-based Image Fusion. Navig. Signal Process. 2010, 60, 33–45. [Google Scholar]
  30. Shailesh, P.; Rajesh, T. Implementation and comparative quantitative assessment of different multispectral image pansharpening approaches. Signal Image Process. Int. J. 2015, 35–48. [Google Scholar] [CrossRef]
  31. Alparone, L.; Wald, L.; Chanussot, J. Comparison of Pansharpening Algorithms: Outcome of the 2006 GRS-S Data-Fusion Contest. IEEE Trans. Geosci. Remote Sens. 2007, 45, 3012–3021. [Google Scholar] [CrossRef]
  32. Wang, Z.; Bovik, A.C.; Sheikh, H.R. Image Quality Assessment from Error Visibility to Structural Similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
  33. Alparone, L.; Aiazzi, B.; Baronti, S. Multispectral and Panchromatic Data Fusion Assessment Without Reference. Photogramm. Eng. Remote Sens. 2008, 74, 193–200. [Google Scholar] [CrossRef]
  34. Zhang, K.; Ren, W.; Luo, W. Deep Image Deblurring: A Survey; Springer: Berlin/Heidelberg, Germany, 2022; pp. 2103–2130. [Google Scholar]
  35. Hoßfeld, T.; Heegaard, P.E.; Varela, M. Qoe beyond the Mos: An In-Depth Look at Qoe via Better Metrics and Their Relation to Mos; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
  36. Yan, C.; Chang, X.; Li, Z. ZeroNAS: Differentiable Generative Adversarial Networks Search for Zero-Shot Learning; IEEE: Washington, DC, USA, 2022; pp. 9733–9740. [Google Scholar]
Figure 1. Architecture of the vanilla autoencoder.
Figure 1. Architecture of the vanilla autoencoder.
Entropy 25 01467 g001
Figure 2. Architecture of the convolutional autoencoder.
Figure 2. Architecture of the convolutional autoencoder.
Entropy 25 01467 g002
Figure 3. Architecture of the proposed denoising vanilla autoencoder.
Figure 3. Architecture of the proposed denoising vanilla autoencoder.
Entropy 25 01467 g003
Figure 4. Difference between histogram of original Lenna image and histogram of corrupted Lenna image.
Figure 4. Difference between histogram of original Lenna image and histogram of corrupted Lenna image.
Entropy 25 01467 g004
Figure 5. Histogram of the result of the corrupted image of Lenna processed by DVA.
Figure 5. Histogram of the result of the corrupted image of Lenna processed by DVA.
Entropy 25 01467 g005
Figure 6. Learning curves obtained during the training of the DVA.
Figure 6. Learning curves obtained during the training of the DVA.
Entropy 25 01467 g006
Figure 7. Testing images.
Figure 7. Testing images.
Entropy 25 01467 g007
Figure 8. Box-and-Whisker plots of the quantitative results obtained on GS images.
Figure 8. Box-and-Whisker plots of the quantitative results obtained on GS images.
Entropy 25 01467 g008
Figure 9. Box-and-Whisker plots of the quantitative results obtained on RGB images.
Figure 9. Box-and-Whisker plots of the quantitative results obtained on RGB images.
Entropy 25 01467 g009
Table 1. Comparative visual results to GS image.
Table 1. Comparative visual results to GS image.
Original GS Image
Entropy 25 01467 i001
Noisy Images
σ = 0 σ = 0.10 σ = 0.15 σ = 0.20 σ = 0.30 σ = 0.40 σ = 0.50
Entropy 25 01467 i002Entropy 25 01467 i003Entropy 25 01467 i004Entropy 25 01467 i005Entropy 25 01467 i006Entropy 25 01467 i007Entropy 25 01467 i008
DVA results
Entropy 25 01467 i009Entropy 25 01467 i010Entropy 25 01467 i011Entropy 25 01467 i012Entropy 25 01467 i013Entropy 25 01467 i014Entropy 25 01467 i015
DnCNN results
Entropy 25 01467 i016Entropy 25 01467 i017Entropy 25 01467 i018Entropy 25 01467 i019Entropy 25 01467 i020Entropy 25 01467 i021Entropy 25 01467 i022
Restormer results
Entropy 25 01467 i023Entropy 25 01467 i024Entropy 25 01467 i025Entropy 25 01467 i026Entropy 25 01467 i027Entropy 25 01467 i028Entropy 25 01467 i029
Nafnet results
Entropy 25 01467 i030Entropy 25 01467 i031Entropy 25 01467 i032Entropy 25 01467 i033Entropy 25 01467 i034Entropy 25 01467 i035Entropy 25 01467 i036
Table 2. Comparative visual results to RGB image.
Table 2. Comparative visual results to RGB image.
Original RGB Image
Entropy 25 01467 i037
Noisy Images
σ = 0 σ = 0.10 σ = 0.15 σ = 0.20 σ = 0.30 σ = 0.40 σ = 0.50
Entropy 25 01467 i038Entropy 25 01467 i039Entropy 25 01467 i040Entropy 25 01467 i041Entropy 25 01467 i042Entropy 25 01467 i043Entropy 25 01467 i044
DVA results
Entropy 25 01467 i045Entropy 25 01467 i046Entropy 25 01467 i047Entropy 25 01467 i048Entropy 25 01467 i049Entropy 25 01467 i050Entropy 25 01467 i051
DnCNN results
Entropy 25 01467 i052Entropy 25 01467 i053Entropy 25 01467 i054Entropy 25 01467 i055Entropy 25 01467 i056Entropy 25 01467 i057Entropy 25 01467 i058
Restormer results
Entropy 25 01467 i059Entropy 25 01467 i060Entropy 25 01467 i061Entropy 25 01467 i062Entropy 25 01467 i063Entropy 25 01467 i064Entropy 25 01467 i065
Nafnet results
Entropy 25 01467 i066Entropy 25 01467 i067Entropy 25 01467 i068Entropy 25 01467 i069Entropy 25 01467 i070Entropy 25 01467 i071Entropy 25 01467 i072
Table 3. Comparative results of PSNR in GS images.
Table 3. Comparative results of PSNR in GS images.
GS ImageDensityNoisy ImageDVADnCNNRestormerNafnet
Airplane GS0inf26.54571.19736.98732.961
0.1011.85923.72922.30522.13710.312
0.1510.61023.01420.09720.8187.995
0.209.84122.37818.70520.1288.717
0.308.89620.93816.85919.1329.407
0.408.33820.47415.83318.4768.043
0.507.95919.31215.10917.9377.823
Baboon GS0inf17.47833.96626.41410.021
0.1011.29819.01020.20317.5608.926
0.1510.22118.10319.27716.6349.159
0.209.59218.59618.67616.0038.892
0.308.82418.22217.65415.2948.761
0.408.37717.91316.97514.8278.840
0.508.06617.70216.48014.4768.861
Barbara GS0inf23.64039.19832.2858.417
0.1011.46921.79521.66917.4728.846
0.1510.33621.30920.19116.1209.119
0.209.67320.11919.17115.2458.514
0.308.83720.16017.72614.1508.054
0.408.33019.67216.76213.4508.051
0.508.02918.97516.24113.0838.124
Cablecar GS0inf25.85367.16036.97431.302
0.1012.06922.68620.75117.1137.295
0.1510.80022.08419.04715.6906.993
0.209.95121.03217.63614.6407.270
0.308.91020.55815.98113.4067.123
0.408.29019.64314.94512.6866.887
0.507.87218.76514.21612.1986.826
Goldhill GS0inf27.99752.05639.72033.700
0.1011.59524.86722.68417.5417.818
0.1510.45023.89620.74415.9588.031
0.209.72223.34619.39014.8987.954
0.308.85722.31317.68613.6767.718
0.408.33521.50516.63712.9487.716
0.507.97120.77415.87412.4607.640
Lenna GS0inf30.19672.56638.52735.414
0.1011.38324.34423.65218.9978.645
0.1510.28423.74321.72017.5789.051
0.209.61922.94120.33216.7498.815
0.308.82521.90118.56515.6098.394
0.408.35021.07417.50114.9688.531
0.508.04920.65016.89914.5718.566
Mondrian GS0inf20.11759.52431.92130.121
0.1012.53419.67218.87616.5265.621
0.1511.07020.00317.09414.9945.678
0.2010.07519.17015.79013.9705.581
0.308.84218.08614.12112.7135.426
0.408.09416.57813.06811.9695.475
0.507.58116.20412.32311.4465.447
Peppers GS0inf25.59862.04638.16134.348
0.1011.47924.30323.37118.5048.340
0.1510.35323.01021.18716.9758.754
0.209.66722.40219.90916.0648.560
0.308.82921.75218.03314.9408.160
0.408.36321.19317.14914.3478.159
0.508.02320.38316.36313.8388.258
Table 4. Comparative results of PSNR in RGB images.
Table 4. Comparative results of PSNR in RGB images.
RGB ImageDensityNoisy ImageDVADnCNNRestormerNafnet
Airplane RGB0inf26.21555.63836.50232.961
0.1014.57624.08222.85223.81210.312
0.1513.34223.36520.84322.5697.995
0.2012.52622.46119.44921.5488.717
0.3011.52521.89917.69420.2379.407
0.4010.92221.22816.66519.4218.043
0.5010.50319.76215.92618.7987.823
Baboon RGB0inf21.61425.29123.44210.021
0.1014.04319.17119.78117.6998.926
0.1512.98118.89518.91716.7589.159
0.2012.31418.70418.24516.1228.892
0.3011.48818.47517.32415.3778.761
0.4010.96118.14416.66514.8288.840
0.5010.65317.85016.29714.5218.861
Barbara RGB0inf27.41239.11531.28529.037
0.1014.26921.74221.85718.25916.990
0.1513.13421.27120.41617.0028.152
0.2012.42521.05919.42616.1458.285
0.3011.55320.51818.12815.0507.846
0.4011.03320.15717.28514.3487.867
0.5010.66319.70716.72613.8548.348
Cablecar RGB0inf22.79452.13134.42630.961
0.1014.65221.97720.84318.03510.152
0.1513.29321.56318.98316.4197.520
0.2012.41120.12017.75815.4197.403
0.3011.28420.16416.11514.1066.997
0.4010.61219.75715.14613.3046.878
0.5010.14319.03614.45212.7256.985
Goldhill RGB0inf32.64951.97436.45632.535
0.1014.32323.98822.74819.0038.023
0.1513.14923.36220.96817.2878.134
0.2012.39223.03719.68016.1877.666
0.3011.50122.45618.19314.8907.438
0.4010.92721.85617.20114.0207.585
0.5010.55821.18116.55613.4827.853
Lenna RGB0inf28.44633.75832.53831.828
0.1014.36823.79923.14121.06821.847
0.1513.24923.33221.43419.47510.198
0.2012.49622.96620.14318.3448.230
0.3011.61122.46718.69117.0228.185
0.4011.08421.70317.75816.1918.164
0.5010.70721.15217.06315.6298.189
Mondrian RGB0inf17.68836.32429.11328.609
0.1014.72816.72917.40416.62115.873
0.1513.07216.46515.70014.97814.440
0.2011.97615.92714.56013.85013.526
0.3010.56815.09813.05412.43212.291
0.409.69014.84112.08611.54511.420
0.509.07015.03911.39110.91710.330
Peppers RGB0inf33.05748.80134.61532.112
0.1014.51924.49622.65319.36119.103
0.1513.32423.75620.75217.66917.418
0.2012.54023.34919.46816.59416.102
0.3011.56522.60617.83715.3107.490
0.4010.97421.55316.86814.4917.657
0.5010.56920.78416.17913.9427.667
Table 5. Visual and quantitative results obtained by DVA in HD images.
Table 5. Visual and quantitative results obtained by DVA in HD images.
Entropy 25 01467 i073Entropy 25 01467 i074
Sun 2100 × 2034
σ = 0 σ = 0.10 σ = 0.20 σ = 0.30 σ = 0.40 σ = 0.50
Entropy 25 01467 i075Entropy 25 01467 i076Entropy 25 01467 i077Entropy 25 01467 i078Entropy 25 01467 i079Entropy 25 01467 i080
Entropy 25 01467 i081Entropy 25 01467 i082Entropy 25 01467 i083Entropy 25 01467 i084Entropy 25 01467 i085Entropy 25 01467 i086
ERGAS = 5169.806ERGAS = 10,965.422ERGAS = 13,395.159ERGAS = 15,276.500ERGAS = 17,736.873ERGAS = 18,296.674
MSE = 21.131MSE = 124.536MSE = 249.594MSE = 380.183MSE = 567.533MSE = 699.633
PSNR = 34.882PSNR = 27.178PSNR = 24.158PSNR = 22.331PSNR = 20.591PSNR = 19.682
RASE = 0RASE = 1498.244RASE = 1902.722RASE = 2190.058RASE = 2530.487RASE = 2639.515
RMSE = 4.597RMSE = 11.160RMSE = 15.799RMSE = 19.498RMSE = 23.823RMSE = 26.451
SAM = 0.072SAM = 0.273SAM = 0.390SAM = 0.448SAM = 0.489SAM = 0.523
SSIM = 0.994SSIM = 0.964SSIM = 0.926SSIM = 0.896SSIM = 0.867SSIM = 0.842
UQI = 0.782UQI = 0.558UQI = 0.512UQI = 0.499UQI = 0.490UQI = 0.484
Dog 6000 × 2908
Entropy 25 01467 i087Entropy 25 01467 i088Entropy 25 01467 i089Entropy 25 01467 i090Entropy 25 01467 i091Entropy 25 01467 i092
Entropy 25 01467 i093Entropy 25 01467 i094Entropy 25 01467 i095Entropy 25 01467 i096Entropy 25 01467 i097Entropy 25 01467 i098
ERGAS = 5624.483ERGAS = 11,456.096ERGAS = 10,623.462ERGAS = 10,393.671ERGAS = 9919.464ERGAS = 10,406.266
MSE = 217.856MSE = 362.834MSE = 441.465MSE = 566.388MSE = 610.187MSE = 763.037
PSNR = 24.749PSNR = 22.534PSNR = 21.682PSNR = 20.6PSNR = 20.276PSNR = 19.305
RASE = 806.958RASE = 1652.544RASE = 1530.997RASE = 1496.294RASE = 1427.232RASE = 1496.917
RMSE = 14.76RMSE = 19.048RMSE = 21.011RMSE = 23.799RMSE = 24.702RMSE = 27.623
SAM = 0.022SAM = 0.078SAM = 0.089SAM = 0.099SAM = 0.113SAM = 0.131
SSIM = 0.936SSIM = 0.773SSIM = 0.711SSIM = 0.665SSIM = 0.623SSIM = 0.588
UQI = 0.986UQI = 0.936UQI = 0.948UQI = 0.953UQI = 0.956UQI = 0.951
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Miranda-González, A.A.; Rosales-Silva, A.J.; Mújica-Vargas, D.; Escamilla-Ambrosio, P.J.; Gallegos-Funes, F.J.; Vianney-Kinani, J.M.; Velázquez-Lozada, E.; Pérez-Hernández, L.M.; Lozano-Vázquez, L.V. Denoising Vanilla Autoencoder for RGB and GS Images with Gaussian Noise. Entropy 2023, 25, 1467. https://doi.org/10.3390/e25101467

AMA Style

Miranda-González AA, Rosales-Silva AJ, Mújica-Vargas D, Escamilla-Ambrosio PJ, Gallegos-Funes FJ, Vianney-Kinani JM, Velázquez-Lozada E, Pérez-Hernández LM, Lozano-Vázquez LV. Denoising Vanilla Autoencoder for RGB and GS Images with Gaussian Noise. Entropy. 2023; 25(10):1467. https://doi.org/10.3390/e25101467

Chicago/Turabian Style

Miranda-González, Armando Adrián, Alberto Jorge Rosales-Silva, Dante Mújica-Vargas, Ponciano Jorge Escamilla-Ambrosio, Francisco Javier Gallegos-Funes, Jean Marie Vianney-Kinani, Erick Velázquez-Lozada, Luis Manuel Pérez-Hernández, and Lucero Verónica Lozano-Vázquez. 2023. "Denoising Vanilla Autoencoder for RGB and GS Images with Gaussian Noise" Entropy 25, no. 10: 1467. https://doi.org/10.3390/e25101467

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop