Enhanced Learning Enriched Features Mechanism Using Deep Convolutional Neural Network for Image Denoising and Super-Resolution

: Image denoising and super-resolution play vital roles in imaging systems, greatly reducing the preprocessing cost of many AI techniques for object detection, segmentation, and tracking. Various advancements have been accomplished in this ﬁ eld, but progress is still needed. In this paper, we have proposed a novel technique named the Enhanced Learning Enriched Features (ELEF) mechanism using a deep convolutional neural network, which makes signi ﬁ cant improvements to existing techniques. ELEF consists of two major processes: (1) Denoising, which removes the noise from images; and (2) Super-resolution, which improves the clarity and details of images. Features are learned through deep CNN and not through traditional algorithms so that we can be tt er re ﬁ ne and enhance images. To e ﬀ ectively capture features, the network architecture adopted Dual A tt en-tion Units (DUs), which align with the Multi-Scale Residual Block (MSRB) for robust feature extraction, working sidewise with the feature-matching Selective Kernel Extraction (SKF). In addition, resolution mismatching cases are processed in detail to produce high-quality images. The e ﬀ ective-ness of the ELEF model is highlighted by the performance metrics, achieving a Peak Signal-to-Noise Ratio (PSNR) of 42.99 and a Structural Similarity Index (SSIM) of 0.9889, which indicates the ability to carry out the desired high-quality image restoration and enhancement.


Introduction
Images are being used across various domains of today's digital world.Examples of those areas include photography, digital entertainment, computer vision, remote sensing, medical diagnostics, microscopy, space science, and surveillance.Unfortunately, images tend to suffer from degradation during their formation and transmission processes.Such degradations come in many forms, such as noise, blur, intensity non-uniformity, missing pixels because of electronic or sensor failures, and interference due to neighboring electronic devices.These degradations will not only influence the visual quality of images but will also reduce their interpretability, and the effectiveness of image analysis and image processing algorithms.As part of this, restoring and enhancing degraded images is a very important and demanding area that aims to reduce the influence of various degradations and enhance the quality and interpretability of such images.
The purpose of enhancement and restoration techniques is to increase the visual appearance and perceptual quality of images, thus increasing their effectiveness for various desired applications such as image recognition [1], visual object detection [2], and semantic segmentation [3].Enhancement techniques are assigned to improve particular visual qualities such as contrast, sharpness, or color balance.Image restoration techniques, on the other hand, focus on particular kinds of degradation in the image.
Image restoration techniques include denoising to remove noise, deblurring processes to recover sharpness, and inpainting to recover missing regions.Noisy images are very important to remove for interpretation and processing purposes.These types of images are inefficient for processing.Some fundamental unsupervised or supervised algorithms include face print-based identification, hidden identification [4], object detection purposes [5], and image segmentation.
Over time, significant developments have been made in image restoration and enhancement, powered by advancements in computer vision, machine learning, and signal processing [6].Traditional approaches depend on algorithms that influence mathematical models and traditional rules.However, these methods often struggle with complex degradation patterns and require robust generalization across diverse image types.
In recent years, deep learning has changed the way images are processed and restored.Platforms like CNNs and all types of models that are part of deep learning, either encoder-decoder networks [7][8][9][10] or high-resolution models [11][12][13][14], are surprisingly good for image processing.A large amount of data in deep learning models and largescale training makes it possible to learn complex mappings from the input (degraded image) to the output (pristine image) and to learn the task much better than traditional models, leading to more accurate and subjectively realistic image restoration and enhancement.
Our method provides a novel approach to image restoration and enhancement.The proposed approach addresses two specific problems in this area: denoising and superresolution.Our main goal is to deeply explore the up-to-date and futuristic methods for image restoration and enhancement across the two specific challenges we have chosen.Through this process, our goal was to find unique supervised deep CNN techniques that will help improve image restoration and enhancement based on generated low-resolution images.Ultimately, we try to contribute by providing novel research into the development of image restoration and enhancement techniques while relying on aspects of supervised CNNs.
To start our discussions, we take up the topic of denoising a noisy image.Denoising is known to be a classically difficult problem since noise reduction tends to remove the fine details that are important for understanding; that is why classic methods of denoising either tend to provide a lot of noise along with details or tend to eradicate details along with noise.However, recent advances in deep learning-based techniques of denoising have revolutionized the domain of image noise reduction, where we use convolutional neural networks that help the model clear out as much noise as possible while preserving all the minute details of the image.These techniques have significantly increased the effectiveness of image denoising, but a lot of work still needs to be done in this domain.
In the next phase, we focus on the subfield of super-resolution.Super-resolution is a task that aims to restore image resolution beyond its original extent.In other words, this process "enhances" the quality of image resolution such that extra details are revealed, resulting in clearer visual appearances.We investigate the top-notch deep learning frameworks that are sketched specifically for super-resolution.These frameworks utilize neural networks to build up high-quality images from their low-quality equivalents, requiring intricate network designs and training schemes to improve image resolution while maintaining important structural attributes.
This approach aims to create a model that provides better visual results with more improved images and fewer computational complexities.In each resolution stage, the information has been exchanged hierarchically across all the scales.In contrast, traditional methods isolate each scale and process them in a top-down order, making this approach different from conventional methods.The processing of information exchange has been carried out via a kernel fusion process per stream.In addition, ELEF uses a self-attention mechanism after picking a useful set of kernels from each stream.The most significant part is the fusion of traits from varying receptive domains through a fusion block, keeping distinct features.

Related Work
The image restoration problem has been well-examined with the advancement in techniques over the past few years [15][16][17][18].Several methods were introduced with various threats and trials in different restoration fields [14,19,20].In the modern era, techniques including trainable neural networks replace conventional techniques [21][22][23][24], even skipping pre-assumption degradation procedures.The introduction of transformers provides new ways to approach the restoration domain.Some of them were originally designed for NLP tasks [25].Vision Transformers break down the image into sequential patches, study their dependencies through relativity and capabilities, and illustrate images by cultivating the input data that entirely depend on self-attention [26].Later, they are used for denoising, super-resolution, de-raining, and image colorization tasks [27][28][29].The advancement in transformers that lessens the complications while developing sharper visions gives more precise results [30][31][32].In addition, low-rank factorization and approximation approaches [33][34][35][36] ignite transformers yet lead to information loss, parameters, and task dependency.
Denoising.One crucial problem in image restoration is image denoising, which targets removing undesired noise while retaining important details.The traditional denoising approaches are based on filters like the median filter, the Wiener filter, and the wavelet transform, and on transforming coefficients and masking.Several patch-based methods [37][38][39] were also introduced, which resulted in redundancy in visuals.These types of techniques work in either the spatial or frequency domain.They are also built on simple assumptions about the noise (uniformly Gaussian distributed or multiplicative white noise) and/or the image (e.g., Gaussian distribution of DCT coefficients, piecewise smooth behavior) conditions.Deep learning-based methods have made huge progress in image denoising [40][41][42][43][44][45].Convolutional Neural Networks (CNNs) have yielded impressive image denoising by effectively learning noise patterns (filters) and their handling.The latest DCNN (Deep Convolutional Neural Network) obtained the best performance in a wide range of denoising scenarios.
Super-Resolution.Super-resolution is the process of generating corresponding highresolution (HR) images or videos from one or more low-resolution (LR) observations.The traditional methods are mainly devoted to statistical models or interpolation.However, to generate HR images with a natural appearance, including sharp edges and fine textures, many of them were based on sampling theory [46], edge-guided interpolation [47], natural image generation [48], and sparse illustration [49].Recently, the progress of learning techniques for classification and other categories has also been remarkable.There are superresolution (SR) approaches based on deep learning, such as the super-resolution (SR) reconstruction based on a simple CNN (SRCNN) [11] model and the non-linear mapping of high-resolution (SR) reconstruction based on CNN models (EDSR) [39] that follow the improvement of the model results.Deep learning techniques help to learn more about the inherent characteristics of high-resolution images.Different data-based techniques have different design frameworks [50,51].While traditional methods directly produce HR images from LR images [52][53][54][55], modern techniques introduce residual learning architecture [55] for processing high-frequency image information.Apart from that, dense connections [56,57], multi-branch learning [58,59], progressive reconstruction [60], generative adversarial networks (GANs) [50,61,13,62], and recursive learning are also some of the methods for performing super-resolution tasks.Fraunhofer Institute in Germany has developed methods based on recursive learning [63], Non-local Means Filtering based on Variational Models for SR, and exponential cross-diffusion.Based on trained dictionaries, conventional SR methods show better performance in image quality using the dictionary-based high-frequency energy for details of HR images, and simulation results prove that our estimator achieves more precise texture reconstruction in high resolution.
Denoising followed by super-resolution: Few research studies have been conducted on the resolution of noisy images.Singh et al. [64] performed the two tasks separately and then joined the noisy super-resolution task and denoised super-resolution task.Laghrib et al. [65] performed the combined task by combining the denoising task with a newly introduced filter-based algorithm for super-resolution.Hu et al. [66] performed the superresolution task with a simultaneous denoising task derived from a multi-scale noise reduction method.These methods do not leverage neural networks and fail to fully utilize edge information constraints.Chen et al. [67] used GAN for super-resolution and denoising tasks.They utilized the residual network to directly map the image with the original noise map.However, this approach does not fully exploit the constraints of edge information.

Method
We first take a sequential approach, taking care of the most fundamental task of denoising, and super-resolution as a subsequent task.We address the first step of denoising, where we want to refine an image to remove noise.By getting rid of noise, it lets the superresolution step build a sharp image based on a consistent and relatively noise-free foundation, in essence preserving character and improving visual form, as shown in Figure 1.Building on this generalization, we propose a simple two-step restoration pipeline: we perform L1-principal component analysis (L1-PCA) denoising before any of the superresolution tasks.We aim to obtain highly compelling reconstruction results that are visually pleasant and perceptually accurate for various image domains and noise types.By forging the latest technologies into our pipeline and through our careful consideration of various details, our study aims to approach the problem of image restoration beyond cutting-edge leading ways to new image restoration and enhancement methodologies.

Overall Pipeline
The overall pipeline of our proposed restoration method follows a sequential flow of operations to enhance and restore the input image progressively.The input image is first fed to a sequence of MSRB modules as shown in Figure 2, thereby capturing and learning multi-scale features at different levels of abstraction efficiently.By doing this, the model can seize low-level to high-level information and hence make the restoration process easier.

𝑅𝑒𝑠𝑡𝑜𝑟𝑒𝑑 𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙 𝐼𝑛𝑝𝑢𝑡 𝑅𝑒𝑠𝑖𝑧𝑒 𝐼𝑛𝑝𝑢𝑡
In this pipeline, we first process the original image with the restoration network   to obtain the residual image, and then we upscale the input data back to the recovered residual image's size with the   operation.Finally, we add the resized input image to the restored residual image and obtain the restored output image as the output.The idea of this pipeline is that the restored image can well preserve the details and structure in the original input image and can also include the restoration enhancements that are provided by the network in the residual image.The pipeline tries to make the restoration result visually pleasing by effectively removing the degradation artifacts in the degraded image and enhancing the overall quality of the degraded image.

Residual Resizing Modules (RRMs)
To mitigate the potential noisy discrepancy between input and output images, we introduce the residual resizing modules, which can be viewed as patches of the image revised with respect to noisy appearance.They are useful for ensuring that the final restored image remains the same structure as the clean one and does not delete important supporting details.

Multi-Scale Residual Block (MSRB)
The MSRB module is used as the basic component in our restoration network.It is composed of several convolutional layers with a residual connection, which is helpful to optimize the restoration through so-called residual learning.Different scales of convolutional filters are integrated into each MSRB to capture multi-scale features, and this improves the feature learning ability of the network in dealing with complex variations like texture change, edge change, and structure change inside the images.

Selective Kernel Feature Fusion (SKF)
We are further proposing a Selective Kernel Fusion (SKF) module which allows the network to select the features from all the convolutional features with different kernel sizes.This fusion gives the network the ability to capture effective local and global contextual information.By incorporating features from convolutional layers with different kernel sizes in addition to the mean, we further increase the feature representation capability of the network.As a result, the network can better restore details while inheriting more global context.
SKF means the output feature map after the SKF module,  represents the input feature maps of the different convolutional kernels,  stands for the respective weight value corresponding to each activation map i, and N is the quantity of the activation maps.In the equation, ⊙ stands for the element-wise multiplication, and the normalization via the sum of the weights for each feature map in the denominator ensures the fusion operation.The SKF module selectively fuses the features of the different convolutional kernels to allow the network to effectively capture the local and global context information, which is beneficial for the network to better restore and enhance by adaptively weighting and combining the features to enrich the representation network potential.

Dual Attention Unit (DU)
The Dual Attention Unit (DU) module is intended to update the feature representations by introducing feature channel attention (CA) [68] and spatial attention (SA) [69] mechanisms.The channel attention focuses on modeling the dependencies between different channels, allowing the network to reinforce more useful features and alleviate less informative ones.The spatial attention focuses on modeling the correlation of different spatial locations in the feature maps, enabling the network to selectively emphasize the important regions.Through the cooperation of CA and SA, the DU module can improve the network's discriminant ability and achieve accurate restoration effectively.

Channel Attention (CA)
The CA [68] module can calculate channel-focused attention weights to obtain the informative features and denoise the noisy or irrelevant information by focusing on global statistics.Mathematically, the channel attention mechanism can be represented as follows: where x represents the input feature map,  denotes the ReLU activation function, W1 and W2 are learnable weights, and σ represents the sigmoid function.

Spatial Attention (SA)
The SA mechanism extracts a spatial attention map, which identifies locally important spatial regions, from the feature maps and emphasizes the interactions among the neighboring spatial locations.The SA [69] mechanism models spatial dependencies.Mathematically, the spatial attention mechanism can be represented as follows: where x represents the input feature map,  denotes the ReLU activation function, W3 and W4 are learnable weights, and σ represents the sigmoid function.Our model uses adjustable weights W1, W2, W3 and W4, in the range [0.5-1.5].Any value below or above significantly affects the model's performance on denoising and super-resolution tasks.The best performance was observed when all weights were set to 1.0, indicating that balanced attention mechanisms work well for the said tasks.

Experiments and Results
To assess the performance of the proposed method for IR, restoration, and enhancement using learning enriched features, experiments have been conducted separately for denoising and super-resolution tasks.Experiments are performed on the publicly available Diverse IR Image Datasets SIDD [70] and DND [71], which contain real-life captured degraded IR images under different types of environments.
Before enhancing the image, we will denoise the noisy images first.This is because noise artifacts always cover many details of the image, and noise may also cause some non-existing artifacts to appear in the image.Hence, we must filter out the noise first.By doing this, we can keep only the important details and then enhance the important details in the low-resolution image to a higher resolution.The dataset comprises a large number of real-world images with various forms of degradations.The dataset's images are from noisy environments, as in real applications.The dataset exhibits various noise-level images from different scenes and objects.Thus, the dataset can represent almost all the possible noise level conditions in the real world.

Datasets
The Smartphone Image Denoising Dataset (SIDD) [70] is a benchmarked dataset with real-world noisy images under diverse lighting and ISO conditions, photographed with smartphones.It contains a variety of noise levels in the images.We use a total of 1600 pairs, of which 320 image pairs are used for training and 1280 are used for validation.The datasets are prepared following a series of processing steps to handle camera shift alignment, exposure time adjustment, and intensity scaling.Some sample images of DND and SIDD dataset are shown in Figure 3.The Darmstadt Noise Dataset (DND) [71] is a benchmark dataset including 50 pairs, each consisting of real noisy and ground-truth data captured with various consumergrade cameras.Taking into consideration the ISO, noisy images and reference images are taken with higher and base levels, respectively.As high-resolution images are used, it contains 20 extracted crops of size 512 × 512 per image, resulting in a total of 1000 patches.All of these are used for testing (DND lacks training or validation sets).The ground-truth noise-free images are not publicly released, so an online server is used to provide images for quantitative measures.

Training Dataset Setup
To train our enhancement model effectively, we require a dataset that includes pairs of low-quality images and their corresponding high-quality reference images.These pairs of images act as the ideal output images, used as a benchmark for the model's performance.So, this is the ground truth that the network can use to learn the enhancement mapping accurately.The dataset must include a varied range of image variations, like different levels of degradation, different scenes, and different objects.This is important because the scenarios may change from image to image.The category, scene, and objects in the scene may change for every image.So, to make our model generalize well and perform well on all types of images, we require these parameters in our dataset.Our goal is to make our model work well on all types of real-world images, so if we train our model using such a comprehensive dataset, then the network will have learned very robust enhancement techniques that can be applied to all real-world images.

Experimental Setup
Our deep convolutional neural network model was implemented using the PyTorch library in Google Collaboratory (collab), which is well-suited to Deep Learning frameworks, and running it on a high-performance computing cluster with NVIDIA GPUs.The dataset was randomly split into training, validation, and testing sets with a ratio of 80:10:10, respectively.Table 1 shows the basic characteristics of the datasets used for comparison purposes.The Dual Attention Unit (DU), Residual Residing Modules (RRMs), and Selective Kernel Feature Fusion (SKF) are the same as in MIRNet [72].

Performance Measures
To quantitatively assess the effectiveness of our method, we have used a couple of evaluation metrics, including Peak Signal-to-Noise Ratio (PSNR) [70] and Structural Similarity Index Matrix (SSIM) [71].These metrics provide a clue to the quality, similarity, and error between the restored/enhanced images and the ground truth high-quality images.

Analysis with Baseline Methods
We compared our proposed method with several state-of-the-art and well-established image restoration and enhancement techniques, including conventional approaches and deep learning frameworks, on standard benchmark datasets SIDD [70] and DND [71].Figures 4 and 5 show denoising and super-resolution results in terms of PSNR and SSIM on the SIDD benchmark dataset for MIRNet [72], RIDNet [73], CBDNet [74], and our proposed ELEF mechanism.It is clear from the results in Figures 4 and 5 that CBDNet still has issues with edge preservation and detail preservation, whereas RIDNet and MIRNet preserve edges but lack detail preservation.Our proposed model outperforms other competing mechanisms in edge preservation as well as other detail preservation, with better PSNR and SSIM values.PSNR/SSIM  Figure 6 displays denoising and super-resolution results in terms of PSNR and SSIM on the DND benchmark dataset for MIRNet [72], RIDNet [73], CBDNet [74], VDN [75] and our proposed ELEF mechanism.It can be observed from the results in Figure 6 that MIRNet, RIDNet, VDN, and CBDNet have some blurry effects and smooth out some parts of edges, whereas our proposed model gives excellent results in terms of detail preservation and edge preservation without any deterioration.Furthermore, the proposed filter gives excellent PSNR and SSIM values and outperforms other models.
The proposed model is evaluated and tested quantitatively in terms of PSNR and SSIM for other models as well in Tables 2 and 3 using the benchmark datasets mentioned above.It is evident from Tables 2 and 3 that the proposed model gives much better PSNR and SSIM values when compared with other state-of-the-art models.

Qualitative Results
The proposed method also achieves significantly better restoration and enhancement results in terms of visual effects.To assess the quality of our results, we performed a survey on our university campus and showed the results to 75 participants (students, faculty, and staff).Most of the participants (72) gave satisfactory remarks, except for a few (3) who did not find any differences, resulting in a 96% success rate.It can be clearly seen from the results shown in Figures 4-6 that the noise is significantly reduced, the details are enhanced, and the images are sharpened when compared with the input images.

Ablation Studies
We investigated the impact of our architectural components and design choices on final performance through a series of ablation experiments conducted on image denoising and super-resolution tasks, as shown in Table 4. Table 4 highlights that the absence of skip connections leads to the most significant decline in performance.Without these connections, the network faces convergence issues, resulting in higher training errors and lower PSNR.Additionally, the Selective Kernel Feature Fusion (SKF) mechanism, which facilitates information exchange among parallel convolution streams, proves advantageous and boosts performance.Likewise, the Dynamic Attention Units (DUs) contribute positively to the overall image quality.
Table 5 shows feature combinations with summation (SUM), concatenation (CAT), and Selective Kernel Feature Fusion (SKF).The proposed SKF is better than SUM and CAT, utilizing ∼6 times fewer parameters than CAT and generating better PSNR results.Specifically, SKF effectively enhances feature representation by adaptively selecting and combining informative features from parallel convolution streams.Additionally, a significant reduction in parameters highlights the efficiency and effectiveness of SKF in improving overall model performance.
In Table 6, we perform an experiment on the RealSR [8] dataset.For the denoising task, it has not shown much difference, as the dataset has no noise.Still, the super-resolution task performs better visually and gets higher PSNR and SSIM values, which shows the significance of the method in real-world scenarios.Although it is a computationally resource-oriented task, as the process of denoising has been performed on a denoised dataset, the denoising mechanism still improves the image quality.Furthermore, in our ablation study, as shown in Figure 7, we found that each module within the ELEF model significantly contributes to its overall performance in image restoration and enhancement tasks.When the MSRB (Multi-Scale Residual Block) was removed, our model struggled to capture fine details, resulting in slightly softer restored images, as shown in Figure 7. Excluding the SKF (Selective Kernel Fusion) module affected the model's ability to integrate information across the image, leading to less natural-looking enhancements.Without the DU (Dual Attention Unit), the model had difficulty focusing on important image features, resulting in slightly noisier or less sharp restorations.Lastly, removing the RRM (Residual Refinement Module) meant that the model could not refine images as effectively, leaving some minor imperfections.These findings highlight the importance of each module (MSRB, SKF, DU, RRM) in enhancing the ELEF model's performance and guiding future improvements to achieve better restoration quality in various applications.In addition, we have analyzed our results on different resolutions: ×2, ×3, and ×4, as shown in Figures 8 and 9.It can be seen from the visual results that with the increase in resolution, the quality of the visual results deteriorated.So, it's better to use the proposed mechanism below ×3 resolution for image processing and computer vision tasks.

Conclusions
The experimental results can prove the superiority of the proposed algorithm in various noise and blurring removal tasks, which can be used in image restoration and enhancement.It takes into account the introduction of learning enriched features, multiscale residual blocks, selective kernel feature fusion, dual attention units, and residual size variation modules in deep learning for dealing with different types of imaging noise and blurring degradation.The results are visually attractive and perceptually moderately accurate.There has been a breakthrough in the field of video resolution and damage, particularly in noise removal, followed by video resolution and the use of deep learning technologies.The algorithm developed in this paper makes full use of deep learning technology and other advanced technologies such as learning enriched features in this field.Evaluated against competitive methods, the algorithm has outstanding advantages in terms of image output quality and performance.Through various tests, the proposed method can adapt to real scenes, with high definition and rich details and colors.In light of these factors, imaging systems, monitoring, remote sensing, and other scenarios can clearly and accurately display the target image, which has great advantages for precision analysis and judgment.

Figure 1 .
Figure 1.Flow diagram.Our overall pipeline consists of a series of modules that progressively improve image quality at each stage.The Multi-Scale Residual Blocks (MSRBs) are used to extract and represent different features.The Selective Kernel Feature Fusion (SKF) strengthens feature representation by fusing salient features.The Dual Attention Units (DUs) with feature Channel Attention (CA) and Spatial Attention (SA) mechanisms are used to refine features and enable each reforming process to focus on their restoration.The Residual Resizing Modules smoothly increase or decrease resolution without losing important image features.Building on this generalization, we propose a simple two-step restoration pipeline: we perform L1-principal component analysis (L1-PCA) denoising before any of the superresolution tasks.We aim to obtain highly compelling reconstruction results that are visually pleasant and perceptually accurate for various image domains and noise types.By forging the latest technologies into our pipeline and through our careful consideration of various details, our study aims to approach the problem of image restoration beyond cutting-edge leading ways to new image restoration and enhancement methodologies.

Figure 5 .
Figure 5. Proposed model results on the SIDD benchmark dataset.

Figure 8 .Figure 9 .
Figure 8. Results for different resolutions on the SIDD dataset.
editing, R.B., R.M.Y. and M.W.Y.All authors have read and agreed to the published version of the manuscript.

Table 1 .
Components of benchmark datasets.

Table 2 .
[2]ults on the DND benchmark dataset.Comparison on the SIDD[2]dataset.↑ Highlights that methods are arranged on the bases of PSNR in ascending order.

Table 3 .
[53]arison on the DND[53]dataset.↑ Highlights that methods are arranged on the bases of PSNR in ascending order.

Table 4 .
Impact of different components of MSRBs.

Table 6 .
[8]oising and super-resolution performed on the RealSR[8]dataset.↑ Highlights that methods are arranged on the bases of PSNR in ascending order.