Blind Quality Assessment of Tone-Mapped Images Based on Visual-Processing Features

Albeit high dynamic range (HDR) images contain way richer information than low dynamic range (LDR) images, they need to be tone mapped for the visualization on traditional display devices. The community has developed diverse tone-mapping operators (TMOs), and they produce images with varying qualities. Therefore, developing effective evaluation methods consistent with human visual perception is an urgent necessity for determining the best TMO in specific scenarios, or accordingly optimizing the parameters of a certain TMO. Towards this end, we propose a new blind quality assessment model by simulating the mechanism of human visual processing (HVS), which can adaptively adjust the sensitivity according to the chromatic properties of scenes at the beginning and well represent the perception process. Specifically, we first obtain all the retinal response maps of four visual sensitivities from three opponent color channels. After that, gradient similarities in each color channel are calculated as global features to represent the procedure of visual experience, and local mean values and standard deviations are computed in the brightest and darkest regions of the maps to measure the distortions caused by over- and under-exposure. Meanwhile, the maps’ natural scene statistics (NSS) are utilized as the third set of features. Finally, all these features are blended for quality assessment by support vector regression (SVR). Extensive experiments on two public benchmark databases show our method correlates highly with subjective perception and outperforms other state-of-the-art quality assessment methods.


I. INTRODUCTION
High dynamic range (HDR) technologies provide advances in capturing and displaying the real-world scene radiance ranging from direct sunlight to faint starlight [1]. Compared with low dynamic range (LDR) images, HDR images are able to preserve more visual information and details. Though HDR images and videos have gained increasing popularity and hardware support over the past decades, the majority of current display devices are still in LDR, and even some existing consumer HDR displays don't support the full range of The associate editor coordinating the review of this manuscript and approving it for publication was Yun Zhang . HDR content. Hence, HDR content needs to be tone-mapped to match the range of display devices for visualization. During tone-mapping, the visual quality of HDR images is degraded meanwhile information is lost, due to the compression of the dynamic range.
A well-designed tone-mapping operator (TMO) is supposed to preserve as much visual information from HDR and produce LDR images with excellent perceptual experience. A large number of TMOs with different emphasis have been developed in the last decades [2]. Since the content and structure contained in HDR images are largelydiversified, a TMO behaving well on one HDR image may not be good on another. Hence, it's critical to choose a proper VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ TMO or optimize a current TMO based on its tone-mapping performance, which is usually assessed through tone-mapped images (TMIs).
In the field of image quality assessment (IQA), subjective evaluation is the most accurate, but time-consuming and expensive for setting a standard viewing environment and recruiting graders. Such a dilemma also exists in TMIs quality assessment, therefore, developing an objective metric for TMIs is of significant importance.
Many objective IQA methods have been proposed [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], and they are generally classified into full reference (FR), reduced-reference (RR) and no-reference (NR) methods. Typical FR methods assumed that reference and test images were in the same dynamic range, and they extracted features from both reference and test images to calculate their similarity for quality evaluation. Yet, for our task i.e., TMIs quality assessment where to-be-evaluated tone-mapped LDR images have largely lower dynamic ranges than HDR references, it's difficult to directly make a fair comparison between LDR and HDR. To this end, the tone-mapped quality index (TMQI) [18] used a modified SSIM [3] where the original luminance term (using mean value µ) is deprecated to enable the structure similarity calculation between different dynamic ranges, and a natural scene statistics (NSS) term was added to polish the method. Then, combined with TMQI, the feature similarity index for tone-mapped images (FSITM) [19] compared the local phase similarity of original HDR and tone-mapped LDR images. Instead of simply uniform averaging, different perceptual methods [20], [21] have been used to pool the local qualities and boosted the performance of TMQI. Lastly, TMQI-II [22] achieved a better correlation with subjective scores by new configurations in structure similarity and NSS calculation.
Above methods are all FR. However, since large-data HDR reference images are still scarce, meanwhile FR methods require extra bandwidth to transfer bucky HDR reference counterparts (usually 32-bit quantized), it's necessary to design a NR method which requires only tone-mapped images.
Traditional NR methods [8], mostly designed for evaluating synthesized distortions, adopted common features related to natural images and usually underperformed on TMIs quality assessment since they didn't consider the specific distortion type. It's necessary to assess TMIs quality by paying attention to distinct distortions, including structure corruption, abnormal exposure and colorfulness fade, etc.
Some NR TMIs metrics have started doing so, for example, Gu et al. [23] recognized that a good TMI should well preserve the original information, structure, and naturalness, hence they utilized the entropy values of nine intermediate images generated by darkening/brightening the TMI as their features. They also measured how much structure was preserved in the TMI by calculating the mean value of the gradients above a small threshold, and adopted natural statistics similar to [18]. Apart from this, Kundu et al. [24] proposed a NR approach based on standard space-domain NSS features and HDR-specific gradient-based features. Yue et al. [28], [29] extracted multiple quality-sensitive features in terms of colorfulness, exposure, and naturalness to blindly predict TMIs' scores. Wang et al. [30] exploited local degradation characteristics and global statistical properties to extract features. The attributes of an image i.e., texture, structure, colorfulness, and naturalness, were considered locally or globally. Yue et al. [31] simulated the responses of singleopponent (SO) and double-opponent cells (DO), and obtained textual information and structural information from the SO and DO responses.
Albeit the success of the above blind (NR) TMIs assessment metrics, their ideas and performance still need to be ameliorated. For example, human beings are able to view and make proper assessment on images through human visual system (HVS). Consequently, exactly simulating HVS can be an effective way to improve TMIs assessment method. Currently, almost all metrics mentioned above extracted qualityrelated features from images directly, neglecting the visual processing of images. Yue et al. [31] obtained features from SO and DO responses by simulating HVS chromatic behaviors and achieved an improved performance. However, they merely employed ultimate responses and didn't regard that HVS will perceive an image with the adaptation procedure, which is specific for information acquisition [39]. To this end, our work aims to obtain multiple response images of different visual sensitivities and extract comprehensive features for TMIs quality evaluation by simulating the chromatic procedure of HVS.
Our main ideas are listed as follows: 1) Since the retina of human eyes can adaptively adjust the sensitivity according to image content in the first few seconds to best obtain useful information, our method obtains the retinal responses of three opponent color channels in four sensitivity scales from low to high. These retinal responses serve as the cognitive process of HVS when viewing an image. 2) The responses of each opponent color channel change as the sensitivity increases. These changes should not be same for the images of different qualities, and are useful for quality evaluation, hence we calculate gradient similarities to represent the changes in viewing responses. 3) We also find that the local areas of TMIs may appear over-or under-exposed since the dynamic range might be inappropriately compressed by a specific TMO. Therefore, we segment TMIs into three regions, and extract features from over-and under-exposed local regions respectively. 4) HVS is highly adapted to natural environment after millions of years of evolution. Hence, we also use the natural scene statistics (NSS) of response maps to enhance the performance of our method. The remainder of this paper is organized as follows. Section II detailly describes how above ideas are implemented in the proposed method. Then, in Section III, the edge of our method over competitors is provided by experiments on two publicly available databases (TMID [18] and ESPL-LIVE [24]), and multiple ablation studies are also carried out. Finally, conclusions are drawn in section IV.

II. PROPOSED METHOD
The flowchart of the proposed method is shown in Fig. 1. Similar as successfully adopted learning-based IQA methods [24], [25], [26], [27], our method firstly extracts features from only tone-mapped images i.e., in NR manner, then learns a support vector regression (SVR) to predict quality scores. Different from common practices, our method does not directly extract features from TMIs themselves, instead, we simulate the HVS retinal process of viewing an image, and derive four response maps of four different viewing sensitivities from low to high in each of three opponent color channels i.e., R-G, G-R, and B-Y.
Specifically, global features (F G ), local features (F L ), and NSS features (F S ) are gathered from the twelve response maps in three opponent color channels.

A. RETINAL PROCESSING IN HVS
Human retina, the entrance of image information in HVS, firstly transforms the radiance (of nature scene, or that produced by HDR or LDR images when it's visualized) to electronic signals by photoreceptors. The L, M, S cone photoreceptors correspond to the red, green, and blue components respectively [32]. Also, the retina does some complicated tasks to extract useful information [33]. An effective model is introduced to simulate the processing of the retina as in [34], and response maps of retinal ganglion cells are got in three opponent color channels in four sensitivities. The detail of this model is described as follows.
Given an input image I , the color information is coded in a trichromatic way via the three types of cone photoreceptors.
The cone responses can be formulated as: where R, G, and B are the red, green and blue channels, respectively, * is a convolution operator, and g (x, y, σ ) is a 2-D gaussian filter to simulate the receptive field (RF) of cones, in which the parameter σ is used for controlling the size of RF.
Then, horizontal cells laterally modulate those cone signals for the reception of ganglion layer [35]. The modulation can be looked as a gain control mechanism for cone responses [36], which is given by: withF whereF c simulates the modulation of horizontal cells, and n is used to emphasize the bright pixels [37]. Retinal ganglion cells (RGCs) receive the modulated cone signals and process them with the color-opponent mechanism [38]. Fig. 2 shows the receptive field of RGCs, in which the subunits inhibit each other at first and then inhibit the center. It can be expressed as follows: VOLUME 10, 2022 where x and y are the spatial indices, G sub (x, y) denotes the subunit response, G sur (x, y) is the surround response, and R_G(x, y) is the final response in the R-G opponent channel. The operator max makes the result non-negative. A UG and A SG represent the sensitivities of the subunit and surround respectively. The responses of G-R and B-Y channels can also be similarly calculated using (7), (8), and (9), where Y is the yellow signal with P Y = P R +P G 2 . The parameters of gaussian functions are listed in TABLE 1. According to [39], RGCs are capable of adaptively adjusting the sensitivity to the viewing environment in the first few seconds. During the adaptation procedure, the color variation in each channel may reflect viewing experience and is critical to the quality assessment. Strictly referring to [34], where k means the kth sensitivity adjustment.
As mentioned above, the retina adjusts the sensitivity from low to high when viewing an image. When the sensitivity increases, the color maps of each channel vary and can reflect viewing experience. In our method, K = 4, the maximum value for k, is used to set the total number of sensitivities. Our first consideration is to catch the variation from calculating the gradient similarity. The gradient map is seen as one proper measurement of image structure [40], [41], which can be calculated as: where s x (x, y) and s y (x, y) are the horizontal and vertical filter kernels of Sobel filter, respectively. Mathematically, Sobel operator is implementation-friendly and effective. Then, the gradient similarity is calculated between the first and the kth color maps in terms of the sensitivity in the same opponent color channel, which is expressed by: (12) where Ch is the opponent channel of R-G, G-R or B-Y, and a small constant C = 0.0001 is appended to remove the instability of the equation. The mean value of each similarity map is used as one feature, so here we get 9 features, categorized as F G . Fig. 3 shows three images tone mapped from a same HDR image by three different TMOs on TMQI database. Here, subjective quality scores of (a), (b) and (c) are 5.65, 1.55 and 7.9 in the form of MOS, respectively, where the higher score represents the lower quality experience. Compared to (b) of best visual quality, (a) is under-exposed and (c) is over-exposed and both of them lose a lot of details. To be specific, it's too dark to clearly see the indoor scene from (a) and the pixels of outdoor part in (c) nearly all appear white. Fig. 4 shows the variation of the gradient similarity of Fig. 3 with the sensitivity k. All curves fall with the increase of k, and it can be found that the curves drop faster for Fig. 3(b) in all channels compared to those for Fig. 3(a) or Fig. 3(c). It may be contributed to that: 1) the higher k means the bigger difference created between the kth and the first color maps and then the smaller gradient similarity. 2) Fig. 3(a) and Fig. 3(c) are much flatter than Fig. 3(b) and smaller changes will happen when k increases. For the two severely ill-exposed images, the dropping rates of Fig. 3(c) are all lower than those of Fig. 3(a), which are also consistent with the subjective perception of Fig. 3(a) having better quality experience. Hence, the features of F G have the potential for representing TMIs quality.

2) LOCAL FEARURES (F L )
After clarifying the indispensability of above features, we start to introduce more based on HVS. Consideration here is that, during tone mapping, the highest and the lowest luminance values are significantly compressed, leading to detail loss and bad viewing experience. That is, the quality deterioration is mostly susceptible in these two regions of TMIs [42]. It can also be observed in Fig. 3 that few details  are visible in the darkest area ( Fig. 3(a)) and the brightest area (Fig. 3(c)).
Therefore, for these features, we segment the darkest and brightest regions by the lowest 10% and highest 10% values of input TMIs, respectively. This threshold is calculated in the LAB color space. Afterwards, the mean value µ and standard deviation σ of these two segmented regions of the color maps of each opponent channel are calculated to evaluate their quality loss. The σ is used to measure detail and the µ for brightness measurement. By combining such segmentation and color maps of each opponent channel, we get total 48 features grouped as F L .

3) NSS FEARURES (F S )
Also, natural images have certain statistical properties [43], [44], which may be violated by distortions. Similarly, instead of directly modeling the statistics from the input images, we calculate them for every color map produced by HVS. As in [8], the statistical model first processes the color map ϕ by the mean subtraction and divisive normalization operator (MSCN), which is generated by: where µ ϕ (x, y) is the local mean and σ ϕ (x, y) is the local standard deviation, and w m,n is one element in a circularly symmetric 2D (2M +1)×(2N +1) Gaussian kernel to weight the corresponding neighboring pixel. In our method, we set M = 7, N = 7 as in [24]. Fig. 5 shows the MSCN distributions of the images in Fig. 3. The (a1), (a2) and (a3) in the first row respectively show the distributions of the three opponent color channels (i.e., R-G, G-R and B-Y) of Fig. 3(a), where k is the sensitivity. Similarly, the second row represents the distributions of Fig. 3(b) and the last row corresponds to Fig. 3(c). For the normal-exposure TMI of Fig. 3(b), the distributions in three color channels are more Gaussian-like than the other two tone-mapped images, which means it's more ''natural''. As you can see, with the increase of sensitivity k, the graphical shapes of distributions of Fig. 3(b) vary obviously, but such shapes of Fig. 3(a) and Fig. 3(c) almost keep unchanged. Therefore, we utilize a generalized Gaussian distribution (GGD) to get the statistical features for each scale of response maps to make ''naturalness'' evaluation. The GGD modeling MSCN is given by: VOLUME 10, 2022 with where (x) is the gamma function, α and β are the shape and scale parameters, respectively. Since α and β are taken as quality-aware features, we obtain 24 features here as F S .

C. LEARNING TO A METRIC
Here, we describe how the final metric is derived from above features. Specifically, we adopt the response maps of four scales, which are generated by simulating the HVS processing mechanism. Three kinds of above features, F G , F L , and F S , are extracted from these four response maps in each of the three opponent color channels, and we have got features of total number 81. Following the successful attempt of most NR-IQA methods, we employ SVR to pool all the extracted features. The SVR function [45] is given by: where γ i and γ i are slack variables, b is a bias parameter.
x i and y i are the input feature and quality score, respectively. K x i , x j = φ (x i ) T φ x j is the kernel function, and a radial basis function kernel (RBF) is adopted, which is defined as: Finally, the prediction model is built after setting all training parameters. We use 80% images from 2 datasets (see later experiment) for training.

III. EXPERIMENT A. EXPERIMENT CONFIGURATION 1) EVALUATION CRITERIA
All objective IQA methods will be recognized precise only when their metrics are of high relevance with subjective scores. We use four criteria recommended by the Video Quality Experts Group (VQEG) [46] i.e., root mean squared error (RMSE), Pearson linear correlation coefficient (PLCC), Spearman rank-order correlation coefficient (SRCC), and Kendall's rank-order correlation (KRCC), to measure such correlation. RMSE and PLCC are utilized to measure consistency and accuracy respectively, and SRCC, as well as KRCC, evaluates monotonicity. Lower RMSE and higher PLCC, SRCC, and KRCC stand for better performance. Also, to remove the nonlinearity of objective scores, a logistic regression model of five parameters is adopted before calculating RMSE and PLCC, which is given as follows: where S p is the input objective prediction value, and a 1 , a 2 , a 3 , a 4 and a 5 are the parameters fitted by a nonlinear regression method.

2) DATASETS AND COMPETITORS
We evaluate the performance of the proposed method on two popular databases, TMID [18] and ESPL-LIVE [24].

a: TMID DATABASE
Developed by University of Waterloo, Canada, TMID Consists of 15 HDR images and each of them is tone mapped by 8 different TMOs (total 120 TMIs). It also provides subjective quality scores by twenty observers, in the form of mean opinion score (MOS) ranging from 1 to 8 where a lower score means a higher perceptual quality.

b: ESPL-LIVE DATABASE
Created by the Laboratory of Image and Video Engineering at the University of Texas, USA, ESPL-LIVE Contains 1811 images generated via three approaches, including tone mapping, multi-exposure fusion and post processing. Its subjective evaluation was carried out on a crowdsourcing platform providing average 110 ratings for each image. After that, the collecting subjective ratings were used to calculate the MOS values.

c: COMPETITORS
We compare our method with some state-of-the-art (SOTA) quality evaluation methods, including SSIM [3], FSIM [19], NIQE [8], BRISQUE [9], GM-LOG [15], TMQI [18], BTMQI [23], HIGRADE-2 [24], BLIQUE-TMI [10], Yue's method [29] and Wang's method [30]. Among them, SSIM, FSIM and TMQI are FR and thus can't be implemented on the ESPL-LIVE database which has no original HDR images, and the rest methods are NR. Also, SSIM, FSIM, NIQE, BRISQUE and GM-LOG are designed for evaluating synthesized distortions of standard dynamic range (SDR) images, while the rest are tailored for TMIs. To be specific, SSIM made evaluation by calculating the similarities of brightness, contrast and structure between reference and distorted images [3]; FSIM compared the similarities of the phase congruency and gradient magnitude between image pairs [19]; BRISQUE used the generalized Gaussian distribution (GGD) and asymmetric generalized Gaussian distribution (AGGD) to evaluate the ''naturalness'' of normalized luminance [9]; NIQE considered the sharpness of image regions, and introduced the multivariate Gaussian model (MVG) to represent the statistics of natural images [8]; GM-LOG computed the joint statistics of the gradient magnitude (GM) map and the Laplacian of Gaussian (LOG) response [15]; TMQI removed the luminance comparison component of SSIM and combined a naturalness measure to assess the TMIs quality [18]. BTMQI evaluated TMIs by considering information, naturalness and structure [23]. HIGRADE-2 laid emphasis on extracting NSS features from space domain and gradient domain in three LAB color channels [24]; BLIQUE-TMI computed moment and statistics in both brightness and color domains [10]. Yue's method focused on the degradations of colorfulness, structure and naturalness [29]; Wang's method extracted comprehensive features after exploiting the local and global characteristics of TMIs [30].
Test set consists of 20% (the rest of 80% training set) images from above two datasets. The testing and training process is repeated 1000 times and the median indices are applied as the final results. TABLE 2 shows the comparison results on the TMID and ESPL-LIVE databases. The indices (i.e., RMSE, PLCC, SRCC and KRCC) of best performance are indicated in bold. From the results, we can see that: First of all, those methods designed for synthesized SDR distortions, such as SSIM, FSIM, NIQE, BRISQUE and GM-LOG, can't achieve satisfactory performance. Specifically, the SRCC values of NIQE, BRISQUE and GM-LOG are all below 0.5 on the ESPL-LIVE database. The main reason is they were designed for predicting common distortions, such as blurriness, blocking, noise, etc., which are very unlike distortions caused by tone mapping. Hence, it makes sense these metrics are not competent for TMIs evaluation. Next, we can see that FR method of SSIM obtains the worst results, and the GM-LOG outperforms it by approximately 93% in terms of SRCC. This phenomenon indicates traditional FR metrics supposing the same dynamic range of compared image pairs are not proper for TMI pairs with big gaps of dynamic ranges. Last, Yue's and Wang's methods are the most comparable blind methods, with respective SRCC values of 0.73 and 0.75 on the ESPL-LIVE database, slightly underperforming over ours. As demonstrated in TABLE 2, the proposed method performs better than the state-of-the-arts.    To further verify the superiority of the proposed method, we conduct experiments on individual distortion sets from ESPL-LIVE i.e., tone mapping, multi-exposure fusion and post processing. It can be observed from TABLE 3 that our method obtains the best performance, and only Wang's method delivers slightly better SRCC than ours on multiexposure fusion. Also, our method has a better performance on tone mapping than the other two distortions. To be specific, the SRCC values of tone mapping, multi-exposure fusion and post processing are 0.7662, 0.7087 and 0.6975, respectively, indicating the effectiveness and robustness of our method.

C. ABLATION STUDIES 1) IMPACT OF TRAINING SET SIZE
As mentioned in [23] and [24], the performance of SVR based method is related to the size of training set. Here, we investigate this relationship by varying the training set percentage from 10% to 90% and the remaining as the testing set. For each value of percentage, the non-overlap random split is also carried out 1000 times, and the median indices are shown in TABLE 4. As seen, the performance improves gradually when the percentage of training set increases. Accordingly, we set the portions of the training and testing to 80% and 20% respectively in the final model. Besides, the results show the high robust of the proposed method as the performance is still competent when the training percentage is 30%.

2) ON NUMBER OF SENSITIVITY VALUES
As introduced in Section II-B, the number of sensitivity values K determines the amount of the extracted features for evaluation. To explore its impact, we try K {2, 4, 6, 8} and test on the ESPL-LIVE database.
The results in Fig. 6 show that the performance raises synchronously with K , however, the performance increase brought by bigger K is relatively slight and ''uneconomical'' when K > 4. Therefore, we make a trade-off between the complexity and performance of the algorithm, and neutrally set K = 4. This setting is also consistent with the HVS's property of assessing an image mainly on the first few seconds, which is detailed above.

3) ON OVER-/UNDER-EXPOSURE THRESHOLD
As mentioned above, over-and under-exposure in TMIs might be brought on some TMOs. In section II-B, the mean values and standard deviations of two segmented regions in every color map are calculated as features.

4) ON COMBINATION OF FEATURES
We have adopted three groups of features: global gradient similarity features (F G ), local mean and standard deviation features (F L ), and statistics features (F S ). Although our method has highly superior performance, the contribution of each feature set is still unclear. For this purpose, we also make comparison of the distinct combinations of the three feature sets.
The results of ablation studies on the TMID and ESPL-LIVE datasets are shown in TABLE 5. As seen, the contribution of the feature groups to the performance lies in: 1) For individual feature set evaluation, F L obtains the best performance on both datasets, validating that over-and under-exposure are critical for TMIs quality assessment. 2) F s produces a relatively worse performance than other two (i.e., F G and F L ) on the both datasets, due to the incapability of traditional NSS expressing the specific distortions of TMIs.
3) The performance boosts when features are combined, and the more features involved, the better performance will be. Thus, the three feature sets are complementary and should be fused together for quality evaluation.

D. RUN TIME
To be efficient for practical deployment, a good quality assessment method should be with low complexity and consume little time as well. Therefore, we also investigate the run time of the proposed method, as well as the competing methods. All algorithms are operated in MATLAB 2016b on a windows 10 laptop, with a 2.5 GHz CPU processor and 8 GB RAM.  implemented on the TMID database. Obviously, our method performs moderate, about 1.73 seconds/image, consuming much less time than HIGRADE-2. As for real-time applications, our notebook computer is a little incompetent and it is feasible for reducing the proposed method's run time by providing higher computational power.

E. FURTHER DISCUSSION
By mimicking the HVS procedure, we have built a blind quality assessment model for TMIs. Experiments show the edge of our model over the previous researches. The superiority of our method can mainly be attributed to that: First of all, we extract features concerning TMIs' specific distortions, such as local features to measure the distortions in over-and under-exposed regions. The ablation results in TABLE 5 show that a desirable performance can be achieved with just applying F l . In contrast, the methods of SSIM, FSIM, NIQE, BRISQUE and GM-LOG were designed for evaluating synthetical distortions, which are basically different from distortions in TMIs. Hence, they are not competent in evaluating TMIs (in TABLE 2). It is important to note SSIM has the worst performance, suggesting that the direct luminance comparison should not be used in TMIs assessment. Secondly, all features are obtained from visual response maps. Since HVS plays the first important and decisive role in assessing images, simulating HVS is regarded as one of the most effective solutions to IQA problems [29], [31]. All the competing methods in this section straightly extract quality-related features from to-be-evaluated TMIs, thus it is supposed that our extracted features possibly better express visual experience. From Fig. 4 and Fig. 5, it is clear to see the collecting data from the visual procedure have the ability of well representing the quality. Lastly, our method gathers features in three opponent color channels. Color destruction will occur following the dynamic range compression of tone mapping [28], [29]. For example, obvious colorfulness losses can be seen from ill-exposed Fig. 3(a) and Fig. 3(c). To address such an issue, the most recently proposed NR methods, including HIGRADE-2, BLIQUE-TMI, Yue and Wang, assess the chromatic information and achieve promising results, which are all better than those of benchmark FR TMQI. Therefore, our substantial attention to color properties may be a positive factor for the proposed method's outperformance. Albeit the advantages of the proposed method, there are still several aspects to be improved: 1) We can adopt more precise HVS models to simulate the procedure of visual process more accurately. 2) Currently, the over-and under-exposed regions are segmented by simple threshold. In the future work, we plan to use a more appropriate contentrelated partition methods as in [30]. 3) At present, we use ''hand-crafted'' features and adopt a SVR learning-scheme. Such features and learning mechanism might be insufficient compared with learning-based deep neural network (DNN). Hence, we can utilize DNN for automatic feature extraction of TMIs assessment, on the premise of developing a new massive database for such data-driven approaches.

IV. CONCLUSION
In this article, we propose a new effective blind (NR) quality model for TMIs using extracted features by simulating the procedure of HVS. Specifically, HVS is adaptive to the content of images, and the retinal responses of different sensitivities are used to indicate the visual processing. To envelop the variation of visual responses, gradient similarities are calculated between the first and the kth sensitivity color maps in each opponent color channel. Also, due to the overor under-exposure caused by the dynamic range compression, we segment to-be-assessed TMIs into three parts, and calculate local means and standard deviations as features in the darkest and brightest regions in all response maps. We also employ natural statistics for all the response maps to ameliorate the performance. Then, SVR is utilized to pool all these obtained features and build a regression model for quality prediction. Experimental results on the TMID and ESPL-LIVE databases demonstrate the proposed method is superior over the other state-of-the-art assessment metrics.
CHENG GUO received the B.E. degree in communication engineering from Shandong University, Weihai, China, in 2017, and the M.E. degree in communication and information system from the Communication University of China, Beijing, China, in 2020, where he is currently pursuing the Ph.D. degree in communication and information system. His research interests include high dynamic range content for television, learningbased low-level vision, image quality assessment, computational photography, and color science.
QING SHEN was born in Shanxi, China, in 1982. She received the B.S. degree in computer science and technology and the M.S. degree in computer applications technology from the North University of China, in 2004 and 2007, respectively. She is currently an Associate Professor with Huzhou College, Huzhou, China. She is the author or coauthor of more than 30 articles in refereed international journals. Her current research interests include image processing, intelligent information processing, and swarm intelligence.