Elsevier

Applied Soft Computing

Volume 9, Issue 2, March 2009, Pages 541-552
Applied Soft Computing

No-reference image quality assessment using modified extreme learning machine classifier

https://doi.org/10.1016/j.asoc.2008.07.005Get rights and content

Abstract

In this paper, we present a machine learning approach to measure the visual quality of JPEG-coded images. The features for predicting the perceived image quality are extracted by considering key human visual sensitivity (HVS) factors such as edge amplitude, edge length, background activity and background luminance. Image quality assessment involves estimating the functional relationship between HVS features and subjective test scores. The quality of the compressed images are obtained without referring to their original images (‘No Reference’ metric). Here, the problem of quality estimation is transformed to a classification problem and solved using extreme learning machine (ELM) algorithm. In ELM, the input weights and the bias values are randomly chosen and the output weights are analytically calculated. The generalization performance of the ELM algorithm for classification problems with imbalance in the number of samples per quality class depends critically on the input weights and the bias values. Hence, we propose two schemes, namely the k-fold selection scheme (KS-ELM) and the real-coded genetic algorithm (RCGA-ELM) to select the input weights and the bias values such that the generalization performance of the classifier is a maximum. Results indicate that the proposed schemes significantly improve the performance of ELM classifier under imbalance condition for image quality assessment. The experimental results prove that the estimated visual quality of the proposed RCGA-ELM emulates the mean opinion score very well. The experimental results are compared with the existing JPEG no-reference image quality metric and full-reference structural similarity image quality metric.

Introduction

The main objective of image/video quality assessment metrics is to evaluate the visual quality of a compressed image/video with/without referring to their original form. It is imperative that these measures exhibit good correlation with perception by the human visual system (HVS). The most widely used objective image quality metrics, namely, the mean square error (MSE) and the peak signal-to-noise ratio (PSNR), as widely observed, do not correlate well with human perception [1] besides requiring the original reference image to compute distortion. Most images on the Internet and in multimedia databases are only available in compressed form, and hence inaccessibility of the original reference image makes it difficult to measure the image quality. Therefore, there is an unquestionable need to develop metrics that closely correlate with human perception without needing the reference image.

Considerable volume of research has gone into developing objective image/video quality metrics that incorporate the perceived quality measurement with due consideration for HVS characteristics. However, most of the proposed metrics based on HVS characteristics require the original image as a reference [2], [3], [4], [5]. Though it is easy to assess the image quality without any reference by manual observations, developing a no-reference (NR) quality metric is a difficult task. To develop NR metrics, it is essential to have a priori knowledge about the nature of artifacts. Currently, NR quality metrics are the subject of considerable attention by the research community, visibly so, with the emergence of video quality experts group (VQEG) [6] which is in the process of standardizing NR and reduced-reference (RR) video quality assessment methods.

The most popular and widely used image format in the Internet as well as in digital cameras happens to be JPEG [7]. Since JPEG uses block-based DCT transform for coding to achieve compression, the major artifact that JPEG-compressed images suffer is blockiness. The compression rate (bit-rate) and image quality are mainly determined by the degree of quantization of these DCT coefficients. The undesirable consequences of quantization manifest as blockiness, ringing and blurring artifacts in the JPEG-coded image. It turns out that the subjective data for all these artifacts are highly correlated [8]. Hence, measuring the blockiness with reference to HVS criteria in turn indicates the image quality. Since, the image quality is a subjective phenomenon, the manual inspection plays an important role. The subjective test is concerned with how an image is perceived by a viewer and designates his/her opinion on a particular image (opinion score). The mean opinion score (MOS) provides average opinion score over all subjects. Here, the objective is to find the functional relationship between the extracted HVS features and MOS for quantifying the quality of the image.

Existing algorithms to measure the blockiness have used a variety of methods to do so. Wang and Bovik proposed an algorithm based on computing the FFT along the rows and columns to estimate the strength of the block edges of the image [9]. Further, they proposed a nonlinear-model for NR quality assessment of JPEG images, where the parameters of the model were determined with subjective test data [10]. Vlachos used cross-correlation of subsampled images to compute a blockiness metric [11]. Wu and Yuen proposed a metric based on computing gradients along block boundaries while tempering the result with a weighting function based on the HVS [3]. Here, the block edge strength for each frame was computed. Similar ideas about the HVS were utilized by Suthaharan [4] and Gao et al. [5]. The general idea behind these metrics was to temper the block edge gradient with the masking activity measured around it. These approaches utilize the fact that the gradient at a block edge can be masked by more spatially active areas around it, or in regions of extremities in illumination (very dark or bright regions) [12]. Jung et al. [13] proposed an NR metric for emulating the full-reference metric proposed by Karunasekera and Kingsbury [2] using neural network. Here they have used general image features for training the neural network and the results are not compared against the subjective test scores. On the other hand, recently, Gastaldo et al. [14], [15] proposed a circular back propagation (CBP)-based image quality evaluation method using the general pixel-based image features such as higher order moments without considering the HVS. In all these above mentioned approaches, extracting large number of general image features are computationally quite complex for real-time implementation. Also, the functional relation between the HVS features and the MOS are nonlinear and is difficult to mathematically model. Under these circumstances, neural networks are best suited for solving such problems.

In the last few decades, extensive research has been carried out in developing the theory and the application of artificial neural networks (ANNs). ANNs possess an inherent structure suitable for mapping complex characteristics, learning and optimization have emerged to be a powerful mathematical tool for solving various practical problems like pattern classification and recognition, medical imaging, speech recognition and control [16], [17], [18], [19]. Furthermore, from a practical perspective, the massive parallelism and fast adaptability of neural network implementations provide more incentives for further investigation in problems involving complex mapping with uncertainties. Of the many neural network architectures proposed, single layer feedforward network (SLFN) with sigmoidal or radial basis function are found to be effective for solving a number of real world problems. The free parameters of the network are learned from given training samples using gradient descent algorithms. The gradient descent algorithms are relatively slow and have many issues in error convergence.

Recently, it is shown that the SLFN network with randomly chosen input weights and hidden bias values can approximate any continuous function to any desirable accuracy [20]. Here, the output weights are analytically calculated using the Moore-Penrose generalized pseudo-inverse [21]. Since, the learning algorithm is faster and has a good generalization ability, it is called ‘extreme learning machine’ (ELM). The ELM algorithm overcomes many issues in traditional gradient algorithms such as stopping criterion, learning rate, number of epochs and local minima. In fact, the performance of the ELM algorithm on many real-world problems have been compared with the other neural network approaches [22] and its performance has been found to be better.

In this paper, we present image quality estimation using ELM algorithm. In general, the quality estimation problem is the process of finding the functional relationship between the MOS values and the feature inputs. But, the MOS values depend on the number of opinion scores per image. If the number of opinions available for a given image is low and statistically different, then it will affect performance of the image quality estimator. Hence, in this study, the problem is circumvented by converting the MOS values to the quality class. The functional relationship between the HVS features and the quality class label is approximated using the ELM classifier. The image quality metric is calculated using the predicted class label and the posterior probability.

Here, the quality classification problem has fewer training samples per class and high imbalance in number of samples per class. In such cases, the generalization performance of the ELM algorithm depends on the proper selection of the input weights and hidden bias values (fixed parameters). Also, the number of hidden neurons affects the generalization performances. Hence, in this paper, we present k-fold cross-validation (KS-ELM) and real-coded genetic algorithm (RCGA-ELM) approaches to select appropriate values for the free parameters in extreme learning machine classifier. In the RCGA-ELM approach, the minimal number of hidden neurons, its corresponding input weights and the bias values are selected automatically, whereas the KS-ELM approach requires an exhaustive search to determine the number of hidden neurons. The proposed RCGA-ELM is different from the existing ‘evolutionary ELM’ (E-ELM) algorithm [23]. In E-ELM, the genetic algorithm searches only for the best input weights and the bias values for a given number of hidden neurons such that the network has better generalization performance. In the E-ELM, the optimal number of hidden neurons are obtained using exhaustive search. Whereas in the RCGA-ELM, new genetic operators are defined to find the minimal number of hidden neurons and their corresponding input weights and the bias values. First, we evaluate the performances of KS-ELM, RCGA-ELM and ELM algorithms using classification problems from UCI machine learning repository [24] to validate the proposed schemes. The results clearly indicate that the proposed KS-ELM and RCGA-ELM provide better generalization performance over conventional ELM algorithm for the classification problems.

For our image quality estimation, experiments are carried out using two disjoint set of original images with its compressed version from the JPEG LIVE image quality database [25]. Out of 29 original images, 20 original images and its compressed version are used for image quality model development. The remaining nine original images and its compressed version are used for evaluating the performance. Finally, the performance of proposed image quality estimators are compared with the available NR image quality metric [10] and full-reference (FR) structural similarity image quality metric (SSIM) [26] techniques.

The organization of this paper is as follows: Section 2 describes the HVS based feature extraction technique. In Section 3, we briefly present the recently developed ELM algorithm and issues related to the ELM algorithm for classification problems with high imbalance in the samples. Section 4 present k-fold ELM and the RCGA-ELM classifier to handle high imbalance in the number of samples per class. Performance evaluation of the proposed classifiers for three benchmark multi-category problems and image quality estimation are presented in Section 5. Section 6 summarizes the main conclusions from this study.

Section snippets

HVS-based Feature Extraction

It is easily deducible that most of the distortion in image/video is due to the block DCT-based compression. The most popular and widely used image format, on Internet and digital cameras happens to be JPEG [7]. Since JPEG uses the block-based DCT transform for coding, to achieve compression, the major artifact that JPEG-compressed images suffer, is blockiness. In the JPEG coding, non-overlapping 8×8 pixel blocks are coded independently using DCT transform. The compression ratio and the image

Extreme learning machine

In this section, we present a brief overview of the extreme learning machine (ELM) algorithm [22]. ELM is a single hidden layer feedforward network, where the input weights are chosen randomly and the output weights are calculated analytically. For hidden neurons, many activation functions such as sigmoidal, sine, Gaussian and hard-limiting function can be used, and the output neurons have linear activation function. ELM uses the non-differentiable or even discontinuous functions as an

Real-coded genetic algorithm approach

The real-coded genetic algorithm (RCGA) is perhaps the most well-known of all evolution based search techniques [28]. Genetic algorithms were developed in an attempt to explain the adaptive processes of natural systems and to design artificial systems based upon these natural systems. Genetic algorithms are widely used to solve complex optimization problems where the number of parameters and constraints are large and the analytical solutions are difficult to obtain. In recent years, many

Experiments and discussions

In this section, we present the performance comparison of proposed KS-ELM, RCGA-ELM and ELM classifiers on benchmark classification data sets first. Next, we present the results for the image quality estimation problem.

Conclusions

In this paper, we have presented a system for predicting image quality using extreme learning machine algorithm, considering various human visual characteristics. The functional relationship between the extracted HVS features and the MOS is modeled by the ELM algorithm. The random selection of input weights and the bias values considerably affects the generalization performance of the ELM algorithm for classification problems with high imbalance in training data set. For this purpose, we

Acknowledgments

This work was in part supported by the ITRC, Korea University, Korea, under the auspices of the Ministry of Information and Communication. The authors would also like to thank Prof. Bovik and his lab members for providing the JPEG image quality assessment database to test our metric.

References (33)

  • JPEG official site,...
  • Z. Wang et al.

    Blind measurement of blocking artifacts in images

  • Z. Wang et al.

    Why is image quality assessment so difficult?

  • T. Vlachos

    Detection of blocking artifacts in compressed video

    (2000)
  • M. Yuen et al.

    A survey of hybrid MC/DPCM/DCT video coding distortions

    Signal Processing

    (1997)
  • M. Jung et al.

    Univariant assessment of the quality of images

    Journal of Electronic Imaging

    (2002)
  • Cited by (226)

    View all citing articles on Scopus
    View full text