Impact of Constant Rate Factor on Objective Video Quality Assessment

. This paper deals with the impact of constant rate factor value on the objective video quality assessment using PSNR and SSIM metrics. Compression eﬃciency of H.264 and H.265 codecs deﬁned by diﬀer-ent Constant rate factor (CRF) values was tested. The assessment was done for eight types of video sequences depending on content for High Deﬁnition (HD), Full HD (FHD) and Ultra HD (UHD) resolution. Finally, performance of both mentioned codecs with emphasis on compression ratio and eﬃciency of coding was compared.


Introduction
Modern telecommunication networks provide a wide range of multimedia services.Video distribution service represents a significant portion of the multimedia segment.End users still require more attractive video content with higher and higher quality parameters.This quality growth is enabled through developments of all segments in transmission chain from video cameras, through telecommunication networks up to televisions.Recent professional video cameras are producing ultra-sharp videos in high resolution.The negative aspect of this process is extremely high amount of data, which leads to the need for effective video compression.Video compression is a compromise between video quality and bitrate.The developers of video compression standards are still seeking algorithms with high compression ratio and required video quality pa-rameters.This process has become the hot topic and great challenge for research teams.
The objective quality of High Efficiency Video Coding (HEVC) also known as H.265/HEVC, VP9 and Advanced Video Coding, known as H.264/AVC (MPEG 4 part 10) compression standards using Peak Signal-to-Noise Ratio (PSNR) and Bjontegaard rate difference (BD-rate) saving is analyzed in [1].In paper [2], the comparison of Dirac and H.264 codecs for Common Interchange Format (CIF) and Quarter CIF (QCIF) resolutions using metrics PSNR and Structural Similarity Index (SSIM) is presented.Paper [3] presents results for HEVC, VP9 and second generation of Audio Video Coding Standard (AVS2) codecs for four different resolutions UHD, FHD, Wide Video Graphic Array (WVGA) and Wide Quarter VGA (WQVGA) and metrics PSNR and BD-rate saving.The comparison of objective (PSNR) and subjective (DSIS) methods for video quality evaluation of HEVC, AVC and VP9 for UHD and FHD is presented in [4].The paper [5] presents objective (PSNR, BD-rate and ∆R) and subjective (Absolute Category Rating (ACR), also known as Single Stimulus (SS)) quality assessment for AVC, VP9 and HEVC codecs and HD and FHD sequences.The paper [6] deals with objective quality of codecs HEVC and VP9 using metrics PSNR, PSNR Human Visual System (PSNR-HVS), SSIM, Visual Information Fidelity in Pixel domain (VIFP) and Video Quality Metric (VQM) for HD and FHD resolution.
The constant rate factor represents a variety of compression parameters, which determine the level of quality.It follows from this that a publication researching the quality performance defined by the CRF value of most used compression standards depending on the content is missing.Therefore, the aim of this paper is to explore the coding efficiency of these most widespread codecs (H.264 and H.265) and the influence of CRF value on the objective quality assessment.

Video Compression and Video Compression Standards
Video compression standards have been developed gradually and their evolution was conditioned by the computation performance of devices, which perform the process of video coding and decoding.standardization organizations.The project of this cooperation is known as the Joint Collaborative Team on Video Coding (JCT-VC).The base coding structure of H.265 remained the same as of the predecessor but also some improvements were done, which significantly increased the coding efficiency.

4.
Video Quality Assessment Generally, the video quality evaluation can be divided into two groups.The first one is the subjective quality assessment and the second one is the objective quality evaluation.
The subjective quality assessment should be quantifying perceived quality.The evaluation process, procedures and conditions are defined in International Telecommunication Union -Radiocommunication Sector ITU-R BT.500-13 [15] and ITU-T P.910 [16].Evaluation is performed by observers who classify quality using the appropriate scale.The biggest advantage of subjective quality assessment is the accuracy of results, the biggest drawback is the duration of evaluation process.
The objective quality assessment, unlike the subjective methods, does not use human resources, but evaluation is performed by computer algorithms, which is a great advantage.The objective assessment is not limited by the count of the test repetitions, the results are independent of psychic mood of respondents and the single analysis is less time consuming; therefore, the objective evaluation process is rapidly quicker and more suitable for research activities than the subjective assessment.The objective methods are known as metrics.The metrics can be divided into two basic groups.
Into the first group, the pixel based metrics, belong the metrics MSE (Mean Square Error) and PSNR and their derivatives (DELTA, Mean Absolute Difference -MSAD, SNR, Aligned PSNR -APSNR, etc.).Even though they are the oldest ones, they are still very popular.It is due to their computation speed and quite a low complexity as well as the simplicity of their device implementation.The quality defined by PSNR value is expressed in decibels while the lowest limit of this metric is equal to zero and the highest limit is theoretically infinity.Higher PSNR value represents higher level of quality [17].The PSNR metric has lower correlation with subjective test results than SSIM metrics.
In the second group, there are metrics based on Human Visual System model (HVS), ranked metrics such as SSIM or VQM (Voice Quality Metric) and many othc 2017 ADVANCES IN ELECTRICAL AND ELECTRONIC ENGINEERING ers.The most popular metric is SSIM and therefore it is used in this paper.
SSIM is quantifying quality through similarity parameter, which depends on measure of three valuesluminance, contrast and structure.All values are eventually combined into the one value from the range from 0 to 1; value 0 stands for the worst quality and the value 1 indicates the best quality.Results obtained from SSIM metric correlate well with subjective quality assessment [18].

Test Sequences and Measurement
The further chapters contain the brief description of used test sequences and the principles of video quality evaluation process.

Test Sequences
The quality evaluation process was done for eight test video sequences [20] depending on content (Fig. 1    In the background of the image, there is a group of slightly static people.At the end of the sequence, the camera zooms on the group of people (Fig. 1(b)).
• Construction Field : shot on the construction site, where the static background is represented by buildings under construction, dynamic objects are represented by construction vehicles (excavator)

Compression ra�o calcula�on
Final value of compression ra�o Fig. 3: Scheme of analysis process.and walking workers.It is the slow-motion scene captured statically (Fig. 1(c)).
• Fountains: the daily shot on the city fountain.
The foreground consists of squirting water (a lot of edges in the picture), the background is static formed by trees and the buildings.The capture is a static scene with low dynamic of motion (Fig. 1(d)).
• Marathon: marathon competition.The runners are multiple moving objects with moderate dynamic, the background is a static road.The camera capture is static from high point of view (Fig. 1(e)).
• Runners: the running challenge, but in contrast to "marathon scene" there are fewer runners.The camera is static, located in the front of the runners slightly angled to the side (higher spatial information).Scene is relatively dynamic (Fig. 1(f)).
• Tall buildings: the shot on the modern city.The static objects are skyscrapers, river and the urban infrastructure; the slow-motion objects are represented by city traffic.The camera is moving slowly form the left to the right side.The scene is characteristic with the change of spatial and temporal information (Fig. 1(g)).
• Wood : the forest scenery.The shot on the trees in the forest (captured objects are static), the motion of the camera is from the left to the right side and the motion is accelerating in the sequence.
Relatively high value of the spatial and temporal information (Fig. 1(h)).

Process of Measurement
The complete process of quality and compression ratio analysis is shown in the Fig. 3.
First, we downloaded test sequences in uncompressed format (*.yuv) from [20], the parameters of which are shown in the Tab. 1. Subsequently, we encoded each sequence with ffmpeg tool (ffmpeg ver.3.2.4)[21] into the desired compression standard (H.264 or H.265), in rated resolution (HD, FHD or UHD) with target value of CRF = 0, 10, 12, 14, 16, 18, 20, 23, 25, 28, 40, 45 and 51.Afterwards we decoded sequences back to uncompressed format with the same tool ffmpeg.Finally, we analyzed quality and compared reference and degraded sequences in uncompressed format using the MSU tool [22].From the outputs, we created plots representing the quality for each sequence in PSNR (Fig. 4, Fig. 5 and Fig. 6; Fig. 8, Fig. 9 and Fig. 10; Fig. 12, Fig. 13 and Fig. 14) and SSIM metrics (Fig. 15, Fig. 16 and Fig. 17 Concurrently, we calculated compression ratio called CR, which is defined as: where FS uncomp corresponds to the file size in bits of uncompressed video sequences and FS comp represents the file size of compressed test sequences in the appropriate compression standard.Also, we created plots of compression ratio (Fig. 26, Fig. 27 and Fig. 28).

Results
The following chapter presents obtained results of performed analyses using PSNR and SSIM metrics.

PSNR Metric
Results using PSNR metric are shown in the Fig. 4, Fig. 5 and Fig. 6 for H.264 compression standard and in the Fig. 8, Fig. 9 and Fig. 10 for H.265.The average value of PSNR for H.264 and H.265 compression standards, for each resolution, are shown in the Fig. 7 and Fig. 11.The PSNR differences between codecs are shown in the Fig. 12, Fig. 13 and Fig. 14.
Globally, the PSNR trend line is different for CRF values in the range from 0 to 10.For CRF value equal or higher than 10, the PSNR differences between codecs are not so significant.Because of this reason, the plots in the Fig. 12, Fig. 13 and Fig. 14 do not show results for CRF values lower than 10.The difference between H.264 an H.265 for standardly used values of CRF (18-28) is more significant for low resolution videos.From the PSNR point of view, the higher is the video resolution, the lower is the difference between compression standards (see Fig. 12, Fig. 13 and Fig. 14).Therefore, the difference of PSNR for different scenes is less significant for higher resolutions.
The results indicate that PSNR differences are lower for static scenes (lower TI); it means that the importance of codec selection is not so substantial for static scenes.The Fig. 4, Fig. 5, Fig. 6 and Fig. 7 show that PSNR value for CRF = 0 is equal to 100 dB.This fact is caused by low compression ratio of H.264 codec and the limitation of maximum PSNR to 100 dB by MSU tool.

SSIM Metric
Results from SSIM metric are shown in the Fig. 15, Fig. 16 and Fig. 17    The plots in Fig. 23, Fig. 24 and Fig. 25 show the difference of SSIM metric for compression standards in CRF range from 0 to 51.Unlike the PSNR, where the difference of metric values is significant, the difference of SSIM values is marginal for CRF lower than 10.
From the Fig. 23, Fig. 24 and Fig. 25, we can state that the difference of SSIM metric values for different scenes depends on the video resolution.For HD resolution, the difference is more significant for CRF 42.For UHD, the highest difference is for CRF approximately 15.The SSIM decreases with increasing CRF.The breakpoint depends on the video resolution.The decrease to SSIM value equal to 0.95 for HD and FHD is approximately for CRF equal to 25, but for UHD is approximately 17 (Fig. 18 and Fig. 22).With the higher resolution, the break point in the SSIM metric occurs at lower values.

Compression Ratio
Although it seems at first glance, that performance of H.264 and H.265 is similar for standardly used CRF values (CRF = 18 − 28), but we cannot omit one very important parameter -compression ratio.Compression ratio was computed using the equation from Subsec.5.2.and the compressed and uncompressed test sequences were compared in the target CRF values.From these results, the graphs for each resolution were created (Fig. 26, Fig. 27 and Fig. 28).The normalization was performed by the compression value of H.265 and final normalized value were computed using Eq. ( 2 From the Fig. 26, Fig. 27 and Fig. 28, we can state that coding efficiency of newer compression standard is unequivocally higher.The efficiency of the compression is rising with growing resolution.Maximums of compression efficiency are in the CRF range from 10 to 20 and close to CRF value equal to 45.Also, we can state that compression efficiency depends on the content type.Slow motion scenes indicate better results.

Conclusion
This paper dealt with the influence of CRF value on the objective video quality assessment.For quantifying quality, the PSNR and SSIM metrics were used.Compression efficiency of codecs H.264 and H.265 with CRF values in the range from 0 to 51 was tested.The assessment was done for eight different video sequences for HD, Full HD and Ultra HD resolution.Finally, quality of both mentioned codecs was compared with the emphasis on compression ratio and coding efficiency.

Tab. 1 :
(a), Fig. 1(b), Fig. 1(c), Fig. 1(d), Fig. 1(e), Fig. 1(f), Fig. 1(g) and Fig. 1(h)).Parameters of source video sequences are shown in the Tab. 1. Information about content (Spatial and Temporal Information -SI and TI)[16] were computed using Mitsu tool[19] and are shown in SI and TI diagram (Fig.2).Parameters of source video sequences.Characteristics of used test sequences:• Bund Nightscape: city night shot.The scene is time lapsed, the dynamic segments of scene are moving cars and walkers on the curb, static segments are represented by urban buildings.The camera captures scene form static position (Fig.1(a)).•Campfire Party: night scene close to the fire.In the front of the image is flaming bonfire (the fast change of temporal and luminance information).

Fig. 2 :
Fig. 2: Information about content of used sequences -SI and TI diagram.
): Comparison of compression efficiency of H.264 and H.265 for HD resolution.Comparison of compression efficiency of H.264 and H.265 for UHD resolution.
where CR norm is normalized compression ratio in percentage, CR H.264 and CR H.265 are compression ratios of H.264 and H.265 codecs.Fig. 27: Comparison of compression efficiency of H.264 and H.265 for FHD resolution.c 2017 ADVANCES IN ELECTRICAL AND ELECTRONIC ENGINEERING