No-reference image quality assessment using modified extreme learning machine classifier
Introduction
The main objective of image/video quality assessment metrics is to evaluate the visual quality of a compressed image/video with/without referring to their original form. It is imperative that these measures exhibit good correlation with perception by the human visual system (HVS). The most widely used objective image quality metrics, namely, the mean square error (MSE) and the peak signal-to-noise ratio (PSNR), as widely observed, do not correlate well with human perception [1] besides requiring the original reference image to compute distortion. Most images on the Internet and in multimedia databases are only available in compressed form, and hence inaccessibility of the original reference image makes it difficult to measure the image quality. Therefore, there is an unquestionable need to develop metrics that closely correlate with human perception without needing the reference image.
Considerable volume of research has gone into developing objective image/video quality metrics that incorporate the perceived quality measurement with due consideration for HVS characteristics. However, most of the proposed metrics based on HVS characteristics require the original image as a reference [2], [3], [4], [5]. Though it is easy to assess the image quality without any reference by manual observations, developing a no-reference (NR) quality metric is a difficult task. To develop NR metrics, it is essential to have a priori knowledge about the nature of artifacts. Currently, NR quality metrics are the subject of considerable attention by the research community, visibly so, with the emergence of video quality experts group (VQEG) [6] which is in the process of standardizing NR and reduced-reference (RR) video quality assessment methods.
The most popular and widely used image format in the Internet as well as in digital cameras happens to be JPEG [7]. Since JPEG uses block-based DCT transform for coding to achieve compression, the major artifact that JPEG-compressed images suffer is blockiness. The compression rate (bit-rate) and image quality are mainly determined by the degree of quantization of these DCT coefficients. The undesirable consequences of quantization manifest as blockiness, ringing and blurring artifacts in the JPEG-coded image. It turns out that the subjective data for all these artifacts are highly correlated [8]. Hence, measuring the blockiness with reference to HVS criteria in turn indicates the image quality. Since, the image quality is a subjective phenomenon, the manual inspection plays an important role. The subjective test is concerned with how an image is perceived by a viewer and designates his/her opinion on a particular image (opinion score). The mean opinion score (MOS) provides average opinion score over all subjects. Here, the objective is to find the functional relationship between the extracted HVS features and MOS for quantifying the quality of the image.
Existing algorithms to measure the blockiness have used a variety of methods to do so. Wang and Bovik proposed an algorithm based on computing the FFT along the rows and columns to estimate the strength of the block edges of the image [9]. Further, they proposed a nonlinear-model for NR quality assessment of JPEG images, where the parameters of the model were determined with subjective test data [10]. Vlachos used cross-correlation of subsampled images to compute a blockiness metric [11]. Wu and Yuen proposed a metric based on computing gradients along block boundaries while tempering the result with a weighting function based on the HVS [3]. Here, the block edge strength for each frame was computed. Similar ideas about the HVS were utilized by Suthaharan [4] and Gao et al. [5]. The general idea behind these metrics was to temper the block edge gradient with the masking activity measured around it. These approaches utilize the fact that the gradient at a block edge can be masked by more spatially active areas around it, or in regions of extremities in illumination (very dark or bright regions) [12]. Jung et al. [13] proposed an NR metric for emulating the full-reference metric proposed by Karunasekera and Kingsbury [2] using neural network. Here they have used general image features for training the neural network and the results are not compared against the subjective test scores. On the other hand, recently, Gastaldo et al. [14], [15] proposed a circular back propagation (CBP)-based image quality evaluation method using the general pixel-based image features such as higher order moments without considering the HVS. In all these above mentioned approaches, extracting large number of general image features are computationally quite complex for real-time implementation. Also, the functional relation between the HVS features and the MOS are nonlinear and is difficult to mathematically model. Under these circumstances, neural networks are best suited for solving such problems.
In the last few decades, extensive research has been carried out in developing the theory and the application of artificial neural networks (ANNs). ANNs possess an inherent structure suitable for mapping complex characteristics, learning and optimization have emerged to be a powerful mathematical tool for solving various practical problems like pattern classification and recognition, medical imaging, speech recognition and control [16], [17], [18], [19]. Furthermore, from a practical perspective, the massive parallelism and fast adaptability of neural network implementations provide more incentives for further investigation in problems involving complex mapping with uncertainties. Of the many neural network architectures proposed, single layer feedforward network (SLFN) with sigmoidal or radial basis function are found to be effective for solving a number of real world problems. The free parameters of the network are learned from given training samples using gradient descent algorithms. The gradient descent algorithms are relatively slow and have many issues in error convergence.
Recently, it is shown that the SLFN network with randomly chosen input weights and hidden bias values can approximate any continuous function to any desirable accuracy [20]. Here, the output weights are analytically calculated using the Moore-Penrose generalized pseudo-inverse [21]. Since, the learning algorithm is faster and has a good generalization ability, it is called ‘extreme learning machine’ (ELM). The ELM algorithm overcomes many issues in traditional gradient algorithms such as stopping criterion, learning rate, number of epochs and local minima. In fact, the performance of the ELM algorithm on many real-world problems have been compared with the other neural network approaches [22] and its performance has been found to be better.
In this paper, we present image quality estimation using ELM algorithm. In general, the quality estimation problem is the process of finding the functional relationship between the MOS values and the feature inputs. But, the MOS values depend on the number of opinion scores per image. If the number of opinions available for a given image is low and statistically different, then it will affect performance of the image quality estimator. Hence, in this study, the problem is circumvented by converting the MOS values to the quality class. The functional relationship between the HVS features and the quality class label is approximated using the ELM classifier. The image quality metric is calculated using the predicted class label and the posterior probability.
Here, the quality classification problem has fewer training samples per class and high imbalance in number of samples per class. In such cases, the generalization performance of the ELM algorithm depends on the proper selection of the input weights and hidden bias values (fixed parameters). Also, the number of hidden neurons affects the generalization performances. Hence, in this paper, we present k-fold cross-validation (KS-ELM) and real-coded genetic algorithm (RCGA-ELM) approaches to select appropriate values for the free parameters in extreme learning machine classifier. In the RCGA-ELM approach, the minimal number of hidden neurons, its corresponding input weights and the bias values are selected automatically, whereas the KS-ELM approach requires an exhaustive search to determine the number of hidden neurons. The proposed RCGA-ELM is different from the existing ‘evolutionary ELM’ (E-ELM) algorithm [23]. In E-ELM, the genetic algorithm searches only for the best input weights and the bias values for a given number of hidden neurons such that the network has better generalization performance. In the E-ELM, the optimal number of hidden neurons are obtained using exhaustive search. Whereas in the RCGA-ELM, new genetic operators are defined to find the minimal number of hidden neurons and their corresponding input weights and the bias values. First, we evaluate the performances of KS-ELM, RCGA-ELM and ELM algorithms using classification problems from UCI machine learning repository [24] to validate the proposed schemes. The results clearly indicate that the proposed KS-ELM and RCGA-ELM provide better generalization performance over conventional ELM algorithm for the classification problems.
For our image quality estimation, experiments are carried out using two disjoint set of original images with its compressed version from the JPEG LIVE image quality database [25]. Out of 29 original images, 20 original images and its compressed version are used for image quality model development. The remaining nine original images and its compressed version are used for evaluating the performance. Finally, the performance of proposed image quality estimators are compared with the available NR image quality metric [10] and full-reference (FR) structural similarity image quality metric (SSIM) [26] techniques.
The organization of this paper is as follows: Section 2 describes the HVS based feature extraction technique. In Section 3, we briefly present the recently developed ELM algorithm and issues related to the ELM algorithm for classification problems with high imbalance in the samples. Section 4 present k-fold ELM and the RCGA-ELM classifier to handle high imbalance in the number of samples per class. Performance evaluation of the proposed classifiers for three benchmark multi-category problems and image quality estimation are presented in Section 5. Section 6 summarizes the main conclusions from this study.
Section snippets
HVS-based Feature Extraction
It is easily deducible that most of the distortion in image/video is due to the block DCT-based compression. The most popular and widely used image format, on Internet and digital cameras happens to be JPEG [7]. Since JPEG uses the block-based DCT transform for coding, to achieve compression, the major artifact that JPEG-compressed images suffer, is blockiness. In the JPEG coding, non-overlapping pixel blocks are coded independently using DCT transform. The compression ratio and the image
Extreme learning machine
In this section, we present a brief overview of the extreme learning machine (ELM) algorithm [22]. ELM is a single hidden layer feedforward network, where the input weights are chosen randomly and the output weights are calculated analytically. For hidden neurons, many activation functions such as sigmoidal, sine, Gaussian and hard-limiting function can be used, and the output neurons have linear activation function. ELM uses the non-differentiable or even discontinuous functions as an
Real-coded genetic algorithm approach
The real-coded genetic algorithm (RCGA) is perhaps the most well-known of all evolution based search techniques [28]. Genetic algorithms were developed in an attempt to explain the adaptive processes of natural systems and to design artificial systems based upon these natural systems. Genetic algorithms are widely used to solve complex optimization problems where the number of parameters and constraints are large and the analytical solutions are difficult to obtain. In recent years, many
Experiments and discussions
In this section, we present the performance comparison of proposed KS-ELM, RCGA-ELM and ELM classifiers on benchmark classification data sets first. Next, we present the results for the image quality estimation problem.
Conclusions
In this paper, we have presented a system for predicting image quality using extreme learning machine algorithm, considering various human visual characteristics. The functional relationship between the extracted HVS features and the MOS is modeled by the ELM algorithm. The random selection of input weights and the bias values considerably affects the generalization performance of the ELM algorithm for classification problems with high imbalance in training data set. For this purpose, we
Acknowledgments
This work was in part supported by the ITRC, Korea University, Korea, under the auspices of the Ministry of Information and Communication. The authors would also like to thank Prof. Bovik and his lab members for providing the JPEG image quality assessment database to test our metric.
References (33)
- et al.
A single-ended blockiness measure for JPEG-coded images
Signal Processing
(2002) - et al.
Objective quality assessment of displayed images by using neural networks
Signal Processing: Image Communication
(2005) - et al.
Extreme learning machine: Theory and applications
Neurocomputing
(2006) - et al.
Evolutionary extreme learning machine
Pattern Recognition
(2005) - et al.
No-reference perceptual quality assessment of JPEG compressed images
- et al.
A distortion measure for blocking artifacts in images based on human visual sensitivity
IEEE Transactions on Image Processing
(1995) - et al.
A generalized block-edge impairment metric for video coding
IEEE Signal Processing Letters
(1998) A perceptually significant block-edge impairment metric for digital video coding
- et al.
A de-blocking algorithm and a blockiness metric for highly compressed images
IEEE Transactions on Circuits and Systems for Video Technology
(2002) - Video Quality Experts Group (VQEG), website:...
Blind measurement of blocking artifacts in images
Why is image quality assessment so difficult?
Detection of blocking artifacts in compressed video
A survey of hybrid MC/DPCM/DCT video coding distortions
Signal Processing
Univariant assessment of the quality of images
Journal of Electronic Imaging
Cited by (226)
An evolutionary supply chain management service model based on deep learning features for automated glaucoma detection using fundus images
2024, Engineering Applications of Artificial IntelligenceA survey of deep learning approaches to image restoration
2022, NeurocomputingKernel extreme learning machine based hierarchical machine learning for multi-type and concurrent fault diagnosis
2021, Measurement: Journal of the International Measurement ConfederationFast discrete curvelet transform and modified PSO based improved evolutionary extreme learning machine for breast cancer detection
2021, Biomedical Signal Processing and ControlRobust extreme learning machine in the presence of outliers by iterative reweighted algorithm
2020, Applied Mathematics and Computation