Elsevier

Signal Processing

Volume 178, January 2021, 107771
Signal Processing

Learning stacking regression for no-reference super-resolution image quality assessment

https://doi.org/10.1016/j.sigpro.2020.107771Get rights and content

Highlights

  • We propose to employ a pre-trained VGGNet model to extract deep visual features to quantify the quality of SR images.

  • The used features are propitious to reveal the fundamental artifacts of SR images than the traditional hand-crafted features.

  • We develop a novel stacking regression-based framework to learn a coarse-to-fine metric for NR-SRIQA.

  • The proposed NR-SRIQA metric can yield more accurate quality prediction on SR images than other-state-of-art predecessors.

Abstract

No-reference super-resolution (SR) image quality assessment (NR-SRIQA) aims to evaluate the quality of SR images without relying on any reference images. Currently, most previous methods usually utilize a certain handcrafted perceptual statistical features to quantify the degradation of SR images and a simple regression model to learn the mapping relationship from the features to the perceptual quality. Although these methods achieved promising performance, they still have some limitations: 1) the handcrafted features cannot accurately quantify the degradation of SR images; 2) the complex mapping relationship between the features and the quality scores cannot be well approximated by a simple regression model. To alleviate the above problems, we propose a novel stacking regression framework for NR-SRIQA. In the proposed method, we use a pre-trained VGGNet to extract the deep features for measuring the degradation of SR images, and then develop a stacking regression framework to establish the relationship between the learned deep features and the quality scores to achieve the NR-SRIQA. The stacking regression integrates two base regressors, namely Support Vector Regression (SVR) and K-Nearest Neighbor (K-NN) regression, and a simple linear regression as a meta-regressor. Thanks to the feature representation capability of deep neural networks (DNNs) and the complementary features of the two base regressors, the experimental results indicate that the proposed stacking regression framework is capable of yielding higher consistency with human visual judgments on the quality of SR images than other state-of-the-art SRIQA methods.

Introduction

The objective of image super-resolution (SR) reconstruction is to generate a high-resolution (HR) image with more details by using one or several low-resolution (LR) images from the same scene [1]. The SR technology has great potential applications in many fields such as computer vision, medical image analysis, remote sensing imaging, and life entertainment. In order to assess the quality of SR images and to further optimize the performance of SR algorithms, one of the most key tasks is to evaluate the quality of resultant SR images. There is no wonder that human’s opinion is the ultimate receiver of images, so subjective quality assessment is regarded as the most direct yet effective way to reflect the quality of SR images [2]. Nevertheless, the process of subjective quality assessment is time-consuming and energy-draining. As a result, this kind of methods cannot be easily integrated into an SR application system for real-world scenarios.

In contrast to subjective quality assessment on SR images, the other kind of SRIQA methods is objective quality assessment, which automatically evaluates the quality of SR images through a computational model. In general, these methods can be classified into three major categories [3], i.e., full reference image quality assessment (FRIQA) [4], reduced-reference image quality assessment (RRIQA) [5], and no reference image quality assessment (NRIQA) [6], [7]. When applying the FRIQA metrics such as mean squared error (MSE), peak signal-to-noise ratio (PSNR), structural similarity (SSIM) [8], and information fidelity criterion (IFC) [9] to SRIQA, the corresponding original HR image is required as a reference to calculate the quality. However, the results obtained by these FRIQA metrics sometimes are not consistent with human’s perceptual quality. Moreover, the original HR images are not available in practice. Consequently, the conventional FRIQA metrics are not suitable for evaluating the quality of SR images and the capability of SR algorithms. The second kind of image quality assessment methods is called RRIQA metrics which only need reduced amount of reference images, so they are better for application in practice than FRIQA. Yeganeh et al. [10] quantified artifacts of SR images by distilling the natural scene statistical features from the frequency domain to the spatial domain. He et al. [11] operated a quality aware collection of local similarity features to predict the SR images degradation. Fang et al. [12] measured the quality of the SR images by virtue of the energy change and texture variation. Although this type of evaluation methods is more flexible than the FRIQA metrics, these methods still require partial information of the original HR images. The third subclass of IQA is named as NRIQA. This kind of methods does not need any information of original images, so they can overcome the shortcomings of the two aforementioned kinds of methods, leading to much attention in the literature.

To target NRIQA, most previous methods usually elaborate on using handcrafted perceptual statistical features to quantify the degradation of SR images. For example, Moorthy et al. [13] adopted natural statistical properties of images in the wavelet domain to characterize the quality of images. Observing that human visual mechanisms are more sensitive to the structural characteristics and the contrast of images, Saad et al. [14] combined the contrast, structural and anisotropic features of DCT statistical coefficients to represent the image distortion. Ma et al. [15] utilized local frequency features, global frequency features, and spatial domain features to quantify the degradation of SR images. Although these approaches are promising for IQA, the image quality cannot be well quantified by the used handcrafted features.

In contrast to the handcrafted features, learning high-level features from deep neural networks (DNNs) has gained much attention and shows successful applications in many fields such as computer vision, pattern recognition, and image processing. The DNNs show powerful representation capability and are able to automatically learn high-level feature representations to fully capture rich intrinsic information of image quality. Recently, a number of DNNs are used to address the task of NRIQA. For instance, Sun et al. [16] exploited the AlexNet architecture to extract semantic features implied in the global image content, and utilized saliency detection and Gabor filters to capture low-level features related to local image content. The overall image quality was estimated by combining these features. Li et al. [17] proposed to employ the ResNet architecture to represent the depth features of each overlapping image block in a statistical manner, by which the image quality is evaluated by a linear regression model. Gao et al. [18] utilized the VGGNet to extract image features of each layer, and the final image quality is estimated by averaging the predicted scores of multiple layers. Kang et al. [19] developed an IQA model by extracting features from Convolution Neural Network (CNN), where the feature extraction and the regression task are integrated into an optimization process. Bosse et al. [20] proposed to employ the CNN to extract high-level features from unpreprocessed image patches for quality prediction. Bianco et al. [21] elaborated on extracting deep features by fine-tuning a pre-trained network with an IQA dataset. With the obtained deep features, a support vector regression (SVR) model is trained to evaluate the quality of images. Experimental results suggested that these applied deep perceptual features perform better on IQA than many traditional handcrafted features due to their powerful capability of quantifying image quality. DNNs models are beneficial to capture the high-level semantics of images and the learned features are highly correlated with the quality degradation, which provides a potential application for NRIQA.

Besides perceptual statistical features, the other key component for IQA is how to build an accurate computational model to map the image features into the quality scores. In the literature of IQA, many predecessors tend to learn a single model, including SVR, for the quality prediction. However, in most cases only an individual model is insufficient enough to reveal the complicated relationship between the perceptual statistical features and the quality scores. To overcome this bottleneck, an alternative way is to introduce ensemble learning for IQA, by which multiple models such as different regression methods, are strategically generated and combined to amend the possible deviation on the quality estimate.

Stacking is an effective ensemble learning technique that builds a new model by combining the predictions from multiple models (e.g., decision tree, KNN or SVM) for a particular task. In principle, the method is the process of integrating different machine learning algorithms through holdout cross-validation [22]. Later Breiman [23] further improved stacking regression by replacing holdout cross-validation with k-fold cross-validation. Unlike bagging and boosting, the stacking regression is a useful ensemble learning technique that combines multiple diverse regression models via a meta-regressor. Previous works [24], [25] have proved that, as a heterogeneous ensemble approach, stacking regression can significantly boost up the prediction performance by maximizing the complementary merits of different models.

In this paper, inspired by the significant advantages of stacking learning, we propose a novel quality metric for NR-SRIQA. In the method, the perceptual statistical features extracted from off-the-shelf VGGNet model are used to quantify the degradation of SR images. And then an effective stacking framework, which employs SVR and KNN regression as the base regressors, is developed to learn a mapping model from the obtained deep features to the perceived quality scores. With the stacking regression model, an NR-SRIQA metric can be used to predict the quality of any an given SR image. In summary, the unique contributions of this paper are mainly two aspects:

  • (1)

    We propose to employ a pre-trained VGGNet model to extract deep visual features rather than hand-crafted statistical features, to quantify the quality of SR images. The used features are propitious to reveal the fundamental artifacts of SR images than the traditional hand-crafted features.

  • (2)

    We develop a novel stacking regression-based framework to learn a coarse-to-fine metric mapping from deep features to quality scores for NR-SRIQA. The proposed quality metric can yield more accurate quality prediction on SR images than other state-of-the-art predecessors.

The remainder of the paper is organized as follows. Section 2 provides the VGG deep features to measure the degradation of SR images and presents a two-layer stacking regression framework for NR-SRIQA. In Section 3, we evaluate the performance of the proposed method and experimentally compare it with the state-of-the-art IQA metrics. Finally, Section 4 concludes the paper and outlooks the future work.

Section snippets

The proposed method

In this section, we first elaborate on the deep feature representation of SR images based on the VGG network. Next, a two-layer stacking regression model for NR-SRIQA is detailed.

Experimental results and analysis

In this section, we first introduce the SRIQA database used in the experiments. And then we carry out a set of validation experiments to validate the effectiveness of feature selection, the base regressor selection, and the heterogeneous ensemble regression. Next we probe how the the scale of training set affects the predicted performance. Finally, we further verify the superiority of the proposed method by comparing the performance of the existing state-of-the-art IQA methods.

Conclusion

We have proposed a remarkably effective NR-SRIQA metric based on stacking regression for SR image quality evaluation. The proposed method first uses the VGGNet model to extract the deep perceptual features to quantify the quality of SR images. Next, a stacking regression model is framed to predict the quality of SR images. In the stacking model, SVR and KNN regression are used as the two base regressors at the first layer and a linear regression as the meta-regressor at the second layer.

Declaration of Competing Interest

None.

Acknowledgment

This work was supported in part by the National Natural Science Foundation of China under Grant 61971339, Grant 61971172, and Grant 61471161, in part by the National Key Research and Development Program of China under Grant 2016QY01W0200, in part by National High-Level Talents Special Support Program of China under Grant CS31117200001, in part by the Key Project of the Natural Science Foundation of Shaanxi Province under Grant 2018JZ6002, and in part by the Graduate Innovation Foundation of

Kaibing Zhang received the M.Sc. degree in Computer Software and Theory from Xihua University, Chengdu, China, in 2005 and the Ph.D. degree in Pattern Recognition and Intelligent System from Xidian University, Xi’an, China, in 2012, respectively. He is currently a Professor at the College of Electrics and Information, Xi’an Polytechnic University, Xi’an, China. His main research interests include pattern recognition, computer vision, and image super-resolution reconstruction. In these areas, he

References (42)

  • T. Ahmad et al.

    The full reference quality assessment metrics for super resolution of an image: shedding light or casting shadows?

    International Conference on Electronics and Information Engineering

    (2010)
  • S. Golestaneh et al.

    Reduced-reference quality assessment based on the entropy of DWT coefficients of locally weighted gradient magnitudes

    IEEE Trans. Image Process.

    (2016)
  • S. Xu et al.

    No-reference/blind image quality assessment: a survey

    IETE Tech. Rev.

    (2017)
  • Z. Wang et al.

    Image quality assessment: from error visibility to structural similarity

    IEEE Trans. Image Process.

    (2004)
  • H.R. Sheikh et al.

    An information fidelity criterion for image quality assessment using natural scene statistics

    IEEE Trans. Image Process.

    (2005)
  • H. Yeganeh et al.

    Objective quality assessment for image super-resolution: a natural scene statistics approach

    19th IEEE International Conference on Image Processing

    (2012)
  • H. Yuqing et al.

    Assessment method of image super resolution reconstruction based on local similarity

    Proceedings of the Fifth International Conference on Internet Multimedia Computing and Service

    (2013)
  • Y. Fang et al.

    Quality assessment for image super-resolution based on energy change and texture variation

    IEEE International Conference on Image Processing (ICIP)

    (2016)
  • A.K. Moorthy et al.

    Blind image quality assessment: from natural scene statistics to perceptual quality

    IEEE Trans. Image Process.

    (2011)
  • M.A. Saad et al.

    Blind image quality assessment: a natural scene statistics approach in the DCT domain

    IEEE Trans. Image Process.

    (2012)
  • C. Sun et al.

    No-reference image quality assessment based on global and local content perception

    Vis. Commun. Image Process. (VCIP)

    (2016)
  • Cited by (22)

    • Hierarchical discrepancy learning for image restoration quality assessment

      2022, Signal Processing
      Citation Excerpt :

      Thus, NR-IQA metrics are more applicable and particularly desirable. In the last few years, the research on NR quality evaluation shows an obvious upward trend [15–33]. Most of these NR-IQA metrics share a similar processing framework, namely quality-aware feature representation and quality regression module [18].

    • No-reference stereoscopic image quality assessment using quaternion wavelet transform and heterogeneous ensemble learning

      2021, Displays
      Citation Excerpt :

      Previous studies [30] have proved the superiority of heterogeneous ensemble, which can boost up the predicted ability by complementing the advantages and disadvantages of different models. For example, in [31], Zhang et al. exploited SVR and K-Nearest Neighbor (KNN) as the base regressors of stacking model to predict super-resolution image quality. Inspired by the above works, we propose a novel NR-SIQA model by using quaternion wavelet transform (QWT) and heterogeneous ensemble learning.

    View all citing articles on Scopus

    Kaibing Zhang received the M.Sc. degree in Computer Software and Theory from Xihua University, Chengdu, China, in 2005 and the Ph.D. degree in Pattern Recognition and Intelligent System from Xidian University, Xi’an, China, in 2012, respectively. He is currently a Professor at the College of Electrics and Information, Xi’an Polytechnic University, Xi’an, China. His main research interests include pattern recognition, computer vision, and image super-resolution reconstruction. In these areas, he has published around 20 technical articles in refereed journals and proceedings including IEEE TIP, TNNLS, Signal Processing (Elsevier), Neurocmputing, CVPR, ICIP, etc.

    Dan’ni Zhu received the B.S. degree in Measurement Technology and Instrument from Xi’an Jiaotong University City College, Xi’an, China, in 2015. She is currently pursuing her M.Sc. degree in the School of Electronics and Information, Xi’an Polytechnic University, Xi’an, China. Her research interests include machine learning, deep learning, and super-resolution image quality assessment.

    Jie Li received the B.Sc. degree in electronic engineering, the M.Sc. degree in signal and information processing, and the Ph.D. degree in circuit and systems, from Xidian University, Xi’an, China, in 1995, 1998, and 2004, respectively. She is currently a Professor in the School of Electronic Engineering, Xidian University, China. Her research interests include image processing and machine learning. In these areas, she has published around 50 technical articles in refereed journals and proceedings including IEEE T-IP, T-CSVT, Information Sciences etc.

    Xinbo Gao received the B.Eng., M.Sc. and Ph.D. degrees in electronic engineering, signal and information processing from Xidian University, Xi’an, China, in 1994, 1997, and 1999, respectively. From 1997 to 1998, he was a research fellow at the Department of Computer Science, Shizuoka University, Shizuoka, Japan. From 2000 to 2001, he was a post-doctoral research fellow at the Department of Information Engineering, the Chinese University of Hong Kong, Hong Kong. Since 2001, he has been at the School of Electronic Engineering, Xidian University. He is currently a Cheung Kong Professor of Ministry of Education of P. R. China, a Professor of Pattern Recognition and Intelligent System of Xidian University and a Professor of Computer Science and Technology of Chongqing University of Posts and Telecommunications. His current research interests include Image processing, computer vision, multimedia analysis, machine learning and pattern recognition. He has published six books and around 300 technical articles in refereed journals and proceedings. Prof. Gao is on the Editorial Boards of several journals, including Signal Processing (Elsevier) and Neurocomputing (Elsevier). He served as the General Chair/Co-Chair, Program Committee Chair/Co-Chair, or PC Member for around 30 major international conferences. He is a Fellow of the Institute of Engineering and Technology and a Fellow of the Chinese Institute of Electronics.

    Fei Gao is currently with the School of Electronic Engineering, Xidian University; and the School of Computer Science and Technology, Hangzhou Dianzi University (HDU). He received his Bachelor Degree in Electronic Engineering and Ph.D. Degree in Information and Communication Engineering from Xidian University (Xi'an, China) in 2009 and 2015, respectively. From Oct. 2012 to Sep. 2013, he was a Visiting Ph.D. Candidate in University of Technology, Sydney (UTS) in Australia. He mainly applies machine learning techniques to computer vision problems. His research interests include visual quality assessment and enhancement, intelligent visual arts generation, biomedical image analysis, etc. His research results have expounded in 20 publications at prestigious journals and conferences. He served for a number of journals and conferences.

    Jian Lu received the M.Sc. degree in Control Science and Engineering from the Xi’an Jiaotong University, Xi’an, China, in 2007, and the Ph.D. degree in Weapon Science and Technology from Northwestern Polytechnical University, Xi’an, in 2015. Since 2001, he has been with the College of Electrics and Information, Xi’an Polytechnic University, Xi’an. His main research interests include underwater robot location, cooperative localization, person re-identification, and small target detection.

    Fully documented templates are available in the elsarticle package on CTAN.

    View full text