Image Super-Resolution Via Wavelet Feature Extraction and Sparse Representation

. This paper proposes a novel Super-Resolution (SR) technique based on wavelet feature extraction and sparse representation. First, the Low-Resolution (LR) image is interpolated employing the Lanczos operation. Then, the image is decomposed into sub-bands (LL, LH, HL and HH) via Discrete Wavelet Transform (DWT). Next, the LH, HL and HH sub-bands are interpolated employing the Lanczos interpolator. Principal Component Analysis (PCA) is used to reduce and to obtain the most relevant features information from the set of interpolated sub-bands. Overlapping patches are taken from the features obtained via PCA. For each patch, the sparse representation is computed using the Orthogonal Matching Pursuit (OMP) algorithm and the LR dictionary. Subsequently, this sparse representation is used to reconstruct a High-Resolution (HR) patch employing the HR dictionary and it is added to the LR image. By applying the quality objective criteria PSNR and SSIM, the novel technique has been evaluated demonstrating the superiority of the novel framework against state-of-the-art techniques.


Introduction
Single-image Super-Resolution (SR) techniques have played an important role in image enhancement resolution from the acquired low-resolution (LR) images and is one of the most active fields in image processing.The main task in image SR is to recover the HR image by minimizing the loss of fine details, contours and edges.In many applications such as medicine, remote perception, HDTV, engineering, video production and so on, there is a need for HR images, for several reasons.It is not possible to recover these HR images because such images or videos are acquired by electronic devices that use different sensors, and these sensors do not have enough resolution.In other cases, the sensed en-vironment is a limitation, e.g., the presence of atmospheric clutter, background noise, unfavorable weather, etc.Finally, it can be a combination of both factors, for example, the acquisition of medical images is limited both by the physical issues of imaging, as well as the time constraints of subjecting patients to the magnetic field without becoming a health hazard.A known measurement of image quality is the spatial resolution of the pixels distributed per unit length/area [1].
In various applications, the images and frames in the video sequences depend on the spatial resolution which is defined as the number of pixels per square area in a camera sensor.Due to the physical limitations and high costs that are required to improve the precision and stability of the imaging system by manufacturing techniques, different image/video processing applications [2] such as those mentioned above, require the development of post-processing techniques and algorithms that should restore the resolution degraded by a sensor, permitting better observations of the texture, edges, and fine details.This step can be performed using the SR procedures that generate the HR images from one or several LR images/video frames.Thus, SR restoration technology is a popular research topic in computer vision applications [3].
To recover an HR image, first, the LR image is modeled.It is assumed that the LR image was obtained using degradation and down-sampling operators.The formation of the LR image is addressed as follows: where X is the unknown HR image, Y is the LR image, S is a downsampling operator, B is a blurring filter, and η is modeled as zero-mean additive white Gaussian noise.
In SR applications, the S operator should be inverted to resolve the inverse problem presented in (1), which is an ill-posed problem because many solutions can satisfy this equation.The core idea is to attach the LR image Y to obtain X, where X must be similar or very close to the HR image X.Recently, many approaches have been developed to obtain HR images [4].These SR techniques can be classified into three categories: interpolation-based methods, reconstruction-based methods and learning-based methods.
In interpolation-based methods, the bicubic method [5], [6] uses a cubic polynomial function for the estimation of the unknown pixels.Another state-of-the-art method is Lanczos interpolation [7], which uses the Lanczos window or sinc funtions where the kernel for Lanczos interpolation is the product of two sinc functions.This interpolation has better approximation capabilities than classical bicubic interpolation.In the literature, there are many edge-directed interpolation methods, for example, the new edged-directed interpolation (NEDI) [8] estimates the local covariance coefficients from an LR image, and then uses this covariance value to estimate and to adapt the linear interpolation between the LR pixels to reconstruct the HR image via the geometric duality among the covariances.The method [9] proposes a new edge-guided nonlinear interpolation technique through directional filtering and data fusion.The authors employ two observation sets for each pixel to be interpolated, and each set produces an estimate of the pixel value, where the missing pixels are fused by the linear minimum mean square-error estimation (LMMSE).Medical images, such as magnetic resonance images, often require HR images in order to observe fine details.Zheng et al. [10] proposed an approach to enhance multi-contrast brain magnetic resonance images, where they explored the statistical information estimated from another contrast MRI image that shared similar anatomical structures.They assumed that some edge structures were shared between different images acquired from the same subject.Their proposal aimed to recover these types of structures to generate an HR image.
Reconstruction-based methods are based on the use of some type of transformation of the pixel intensity contained in an image, where the typical mathematical tools used are Discrete Wavelet Transform (DWT) and Discrete Fourier Transform (DFT).For example, [11], the LR image was decomposed by using the DWT in four sub-bands.These sub-bands were simultaneously processed in spatial and wavelet domains to produce an HR image with better preserved edges.The SR technique proposed in [12] used the DWT to decompose the LR image in sub-bands, and then these sub-bands were interpolated using the bicubic interpolation, and finally, the inverse DWT was applied to the LR image and to the interpolated sub-bands obtaining the SR image.
The learning methods use mathematical tools that attempt to predict the HR image based on a priori knowledge, i.e., sparse representation.The learning methods learn prior knowledge using databases and this prior knowledge is incorporated during the reconstruction process [13].In the stateof-the-art methods, there are pioneering works that use sparse representation to solve the inverse problem presented in (1).For example, Yang et al. [14] proposed an SR technique, in which the LR image is decomposed into patches where for every patch, the sparse representation is performed, and then their coefficients are used to recover the HR image.This sparse representation improves a pair of dictionaries that have the similarity of sparse representations among the HR and the LR patches.Mallat et al. [15] used linear estimators that adapt the prior knowledge.For every patch, the weights are calculated, and according to the signal regularity, an adaptive directional interpolation is performed.A common problem in the learning methods is the use of dictionaries and how to learn and to train over-complete dictionaries.He et al. [16] proposed a Bayesian method to learn the over-complete dictionaries.This Bayesian method employs a beta process model and shows that the sparse representation can be decomposed to values and dictionary atom indicators.Their coupled dictionaries, learned in this way, are used to solve the problem of single image super-resolution.Sparse representation has been used to obtain different types of images.For example, in satellite imaging, the work [17] employed sparse representation to generate HR satellite images using feature extraction of the LR image via Laplacian and Gradient filters.In the field of magnetic resonance, the images obtained through this process always contain aliasing artifacts.The work [18] proposed a patch-based nonlocal operator, called PANO, to sparsify magnetic resonance images by making use of the similarity of image patches.The PANO operator incorporated prior information learned from under-sampled data or another contrast image, which led to optimized sparse representation of images to be reconstructed.Other medical approaches, for example [19], generate overcomplete dictionaries to enhance the quality of Magnetic Resonance (MR) images.This couples high and low frequency information, so an HR version of an LR brain MR image is generated.
In computer vision, Convolutional Neural Networks (CNN) are widely used to solve different vision tasks, such as image classification, denoising, etc.Additionally, several methods have been applied to the SR problem.Some recent works rely on directly learning an end-to-end mapping that can generate HR images using a CNN on an LR image.For example, the pioneering work of Dong et al. [20], presents a deep learning method to solve the problem of single image SR.Their CNN employed four layers to obtain an HR image.In the first layer, the LR image was up-scaled to the desired HR size using a bicubic interpolation.Then, a set of features maps were extracted from the up-scaled LR image in the second layer.A non-linear mapping was performed on the third layer to reconstruct HR maps from the LR feature maps.Finally, in the fourth layer, the HR image was generated.Alternatively, Kim et al. [21] proposed a cascade network, where their cascade network employed d layers to generate a residual image that was added to the initial LR image in order to recover the HR image.
Feature extraction is fundamental to the methods that use sparse representation to super-resolve images.In this paper, a novel single-image SR method that employs the DWT to extract informative features in three directions (horizontal, vertical and diagonal) is proposed.Additionally, the method incorporates prior information via two dictionaries obtained from the combination of the K-SVD algorithm and the features extracted on the wavelet domain.To demonstrate that the single-image resolution enhancement algorithm designed in this study (called Super-Resolution using Wavelet Feature Extraction and Sparse Representation -SR-WAFE-SR) has real advantages, we have compared the novel SR procedure with promising state-of-the-art techniques using objective and subjective criteria that demonstrate the competitive performance of the novel technique.This paper is organized as follows: Sec. 2 describes the process of dictionary training.The proposed SR technique is presented in Sec. 3. Section 4 describes the experimental results and the discussion.Section 5 explains the principal contributions, and finally, the study's conclusion is drawn in Sec. 6.

Dictionary Training
The novel method in SR trains a pair of dictionaries with atoms from LR and HR patches.To train the pair of dictionaries [22], a large database is used that consists of 69 images (i = 1 . . .69).These images were obtained from the code provided by Yang et al. [14] and Zeyde et al. [23].In this database, all the images are treated as HR images and the LR version is obtained by using the down-sampling operator and the blurring filter as follows: Each Y i image obtained from the training database was interpolated using the operator L, which performs a Lanczos interpolation over all the i images.This can be defined as: where Z i are the intermediate LR images via the operator L, and Q transforms an HR image in an LR image.By removing the Low-Frequency (LF) details in each HR image [24], the HR training set S h is obtained.This operation is performed as follows: where F i are the High-Frequency (HF) details for every HR image of the database.Patches are extracted from each F i of size √ n × √ n, and every column of the patch is concatenated column by column to form a vector obtaining a matrix of training set of vectors called S h , where The training set S l was obtained using feature extraction in the Z i images.Feature extraction was performed using the DWT, which decomposes each Z i into four sub-sampled subbands.These sub-bands provide the low (LL) and the high (LH, HL and HH) frequency details.In this part, only the HF sub-bands (LH, HL and HH) were used.The HF sub-bands were interpolated using the L operator to recover the size of the LR image.Figure 2 shows the HF sub-bands obtained via DWT for a given image.Principal Component Analysis (PCA) [25] was employed to reduce the data obtained through the sub-bands interpolated to obtain the most relevant features.One training set S l with the same size as the training set S h was obtained.
The training set S l was used as input in the K-SVD algorithm [23], [26], which was used to solve the optimization problem presented as: where D l is the dictionary that contains atoms from the LR training set.As in Wang et al. [27] and Zeyde et al. [23], the dictionary D h can be obtained using the representation coefficient matrix A:

Proposed SR-WAFE-SR Technique
The LR image Y was interpolated via the operator L to obtain an initial SR image during the interpolation stage.This initial image was named X l .Next, in the feature extraction stage, the X l image was decomposed into sub-bands (LL, LH, HL and HH) using the DWT.The four sub-bands have half size due to decimation of the DWT.Each HF sub-band was interpolated using Lanczos interpolation to recover the same size of the initial image.PCA was applied to three interpolated HF sub-bands (LH, HL and HH) to reduce the dimensionality and to obtain one band X r l with the most relevant details.Next, overlapping patches of √ n × √ n size were obtained from X r l as follows: where E is the extractor operator in position p and x l p is a matrix that contains all the vectorized patches extracted from X r l .Each x l p patch can be approximated via using linear combination of few atoms contained in the matrix called dictionary.The Orthogonal Matching Pursuit (OMP) algorithm [28] was employed with the previously trained dictionary D l to obtain the atoms that provide the best reconstruction of the patch.The sparse coefficients α p were identified for each patch x l p , according to the following minimization problem: The SR patch x s p was recovered using the dictionary D h and the sparse representation α p from the patch.This reconstruction was performed using a simple product written as follows: Overlapping patches were extracted from the X l image in the same position p.The overlapping patches were extracted by applying the same operator E: Finally, in the reconstruction stage, the HR patch is performed by adding the previous x s p and the xl p , performing reconstruction as: Each x h p patch was replaced at the same position p, in which they were taken to obtain the final HR image Xh .Because we used the overlapping patches, there were areas in which the patches overlap.In these overlapping areas, the average of the overlapping pixels was calculated, and it was rounded to the nearest integer.As a result, this final image had better quality than the first initial image X l .A block diagram of the described technique is presented in Fig. 3.

Simulation Results and Discussion
The performance of different state-of-the-art techniques and the proposed SR-WAFE-SR technique were evaluated employing the following criteria: Peak Signal-to-Noise Ratio (PSNR) [29] to determine the noise suppression and improvement in the image reconstruction, and the Structural Similarity Index Measure (SSIM) [29], [30] to estimate the visual quality.
To confirm the results obtained from the proposed SR technique over the state-of-the-art image resolution enhancement techniques, the novel SR-WAFE-SR technique was compared with interpolation methods such as Bicubic interpolation [6], Lanczos interpolation [7] and New Edge-Directed Interpolation (NEDI) [8].It was also compared with promising reconstruction methods that employ wavelet transform (WT), such as Wavelet Zero Padding (WZP) [31] and Demirel-Anbarjafari Super Resolution (DASR) [12].Finally, the proposed SR-WAFE-SR technique was compared with state-of-the-art learning methods that use sparse representation and CNN such as Super-Resolution with Sparse Mixing Estimators (SME) of Mallat et al. [15], the Sparse coding of Yang et al. (ScSR) [14], Beta Process Join Dictionary Learning (BP-JDL) of He [16], and the Super-Resolution Convolutional Neural Network (SRCNN) of Dong et al. [20].
In simulations, the proposed SR technique was tested with images of different types (standard, medical and aerial images).Figure 4 shows the test images used in this paper.
The LR version of the set of grayscale test images was obtained using two scenarios: 1.A bicubic filter followed by downsampling by a scale factor of S = 2.
2. A Gaussian filter of size 3×3 with standard deviation 0.5 followed by downsampling by a scale factor of S = 2.The first scenario was used to super-resolve the standard images and the aerial images, and the second scenario was used according to the medical SR simulations presented in [13].The DWT employed to extract features was the CDF 9/7 WT [32], the trained dictionaries (D h and D l ) have 25×1024 atoms.
The inverted error image (I e ) was obtained as the absolute difference between the HR image X and the image X recovered by each state-of-the-art technique.Then, this image was inverted to identify the dark areas where the errors could be observed.The error image is presented as follows: where is the maximum range of values for a grayscale image and c is an error amplification constant.
In the SR reconstructed Peppers image, one can observe from analyzing Fig. 5 that the novel algorithm performs better in PSNR.In the error images, there is better perception especially in the well-defined borders (see the zoomed portion of the image), where there are presented the LR and the HR images for a better visual perception.
The Tiffany image (see Fig. 6) is an image with flat zones, fine details and textures.The SR algorithm was applied to an LR 256×256 pixels image to obtain a 512×512 pixels HR resolution enhancement image.The novel resolution enhancement algorithm appears to perform better in terms of subjective perception via the human vision system that one can observe in the error images, specifically, better reconstruction over the edges can be seen.
Medical images are usually employed during clinical analysis, treatments and disease prevention.In medical diagnostics, it is necessary to observe hidden structures such as tissues, bones and organs to receive a visual representation of the body.The proposed SR process can be efficiently applied in medical imaging with the purpose of observing these fine structures.Several medical images were tested.For example, Fig. 7 compares the visual perception results obtained over the Medical-4 image.In the zoomed images, one can observe that the state-of-art SR methods cannot recover fine details (e.g., edges).On the other hand, the proposed SR-WAFE-SR algorithm provides a better image objective quality PSNR value.In the error images, one can observe that in the dark areas there is a difference.Therefore, in this area, the re-construction result was inferior.Additionally, this fact shows that there is an edge where the intensity changes.In the white areas, the reconstruction result was sufficient, so there is no error.
The proposed SR-WAFE-SR technique was applied in remote sensing images (Fig. 8) where HR images have been restored.In the resolution enhancement of the Aerial-B image, one can observe that the proposed SR-WAFE-SR performs well in terms of the objective criteria PSNR and SSIM as well as in the subjective perception compared with the learning state-of-the-art techniques when the proposed SR procedure is employed.
Analyzing different experiments that involve images of different databases and natures, it can be concluded after observing SR images that the novel framework results in sharper edges and fine features, better detail cleaning, resulting in SR images that visually closely resemble the original HR image when the proposed SR method is compared with state-of-the-art SR techniques.
Table 1 shows the objective criteria PSNR and SSIM for the standard, medical and aerial test images.One can see better performance on average in accordance with the objective criteria and via subjective visual perception, when the proposed SR enhancement scheme is employed.

Principal Contributions
The principal contributions of the proposed SR technique, which appears to demonstrate better performance compared with state-of-the-art techniques, are as follow: 1.A methodology for training coupled over-completed dictionaries via employing the K-SVD algorithm, where the coupled dictionaries are performed using the WT feature extraction and PCA reduction selecting the most informative features.
2. In the reconstruction stage, the fine details and edges zones are restored well using sparse coding obtained by the trained over-completed dictionaries and initial interpolation approximation based on the Lanczos algorithm.
3. The proposed SR algorithm can be used in images of different types (standard, medical and aerial) resulting in better performance compared with the state-of-theart methods that use sparse representation and convolutional neural networks, as well as interpolation or transform based methods.

Conclusion
A novel resolution-enhancement technique based on sparse representation and feature wavelet extraction was proposed.
Compared with state-of-the-art resolutionenhancement techniques, the proposed SR technique uses DWT to obtain the details from the LR image, and then, these details are applied in reconstruction of the HR image via sparse representation with the help of two dictionaries.By comparing the novel approach with state-of-the-art resolution-enhancement techniques, the proposed SR framework appears to demonstrate superior performance in terms of objective criteria (PSNR and SSIM), as well as in the subjective perception via the human visual system.

Figure 1
shows the different images used for the training set S h .

Fig. 1 .Fig. 2 .
Fig. 1.The three images involved in the training of the set S h .
PSNR and SSIM comparison for all the different test images among different methods for upscaling factor ×2.