Perceived Quality of Full HD Video-Subjective Quality Assessment

In recent years, an interest in multimedia services has become a global trend and this trend is still rising. The video quality is a very significant part from the bundle of multimedia services, which leads to a requirement for quality assessment in the video domain. Video quality of a streamed video across IP networks is generally influenced by two factors – transmission link imperfection and efficiency of compression standards. This paper deals with subjective video quality assessment and the impact of the compression standards H.264, H.265 and VP9 on perceived video quality of these compression standards. The evaluation is done for four full HD sequences, the difference of scenes is in the content – distinction is based on Spatial (SI) and Temporal (TI) Index of test sequences. Finally, experimental results follow up to 30 % bitrate reducing of H.265 and VP9 compared with the reference H.264.


Introduction
In the last years the demand for multimedia services is still rising and the amount of video streaming has grown more and more especially.Due to the quantity of video streams and the requirement of bandwidth the need for developing effective compression has occurred.
The paper is divided as follows.The first part of the article provides rudimentary information about the mentioned compression standards.In the second part, subjective quality metrics and the process of quality assessment used in experimental measurements are described.The last part of this article deals with the experiment results and conclusions which stem from the measurements of perceived video quality by the observers.
Nowadays many compression standards are being introduced, e.g.H.265/HEVC, VP9, DAALA and the video quality of them was tested [1], [2], [3] and [4].Each of these mentioned standards indicate a high level of compression.Their comparison in terms of subjective quality is the aim of this paper.Quality comparison of compression standard is very important to providers of video services and end users as well.

Compression Standards
The Advanced Video coding known as H.264/AVC (MPEG-4 Part 10) is the oldest of the mentioned compression standards (approved in 2003), but globally still most used.The versatility of this standard provides a wide range of applications from video in smartphones to TV broadcasting and multimedia content on Blu-ray discs.
The High Efficiency Video Coding known as H.265/HEVC (approved in January 2013) is the most recent joint video cooperation result of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) standardization organizations.Collaboration of these groups and participation on this project is known as the Joint Collaborative Team on Video Coding (JCT-VC).H.265 is the successor to the very popular H.264 standard.The basic features and structure of H.265 stay the same as its predecessor, but it is considered to contain many significant improvements which make video compression more effective [5], [6] and [7].
VP9 is the WebM Project's next-generation open video codec and VP9 is the direct successor of VP8, which was the biggest competitor to H.264. WebM is an open, royalty-free media file format.VP9 was approved in June 2013.The most prominent features of WebM can be considered the openness, innovation and optimisation for the web.The main aim of the WebM Project is to speed up the pace of video compression innovation (i.e. to get better and faster).VP9 was enabled by default in the Google Chrome Dev channel [8].

3.
Video Quality Assessment  [9], but it's still very popular and often used because it can be computed very easily and quickly.The SSIM metric measures three components (the similarity of luminance, contrast and structure) and combine them to a value in the range from 0 to 1, where 0 is the worst and 1 is the best quality [10].The VQM metric computes the visibility of artefacts expressed in the DCT domain.The output value represents the amount of distortion with the best quality indicated by a value close to zero [11].
Subjective quality assessment is based on the vote of human (observers), quantify perceived video quality using discrete values from a certain range (scale depending on chosen method).The biggest benefit of subjective quality assessment is the credibility of the results -objective methods do not achieve such accuracy of results (they are based only on a model of perceived quality) and values from metrics are only approximations of real video quality.The drawback of subjective methods is that it is time-consuming and human resources are needed.
Most used subjective methods are: • DSIS -Double Stimulus Impairment Scale also known as Degradation Category Rating (DCR).
• ACR -Absolute Category Rating also known as Single Stimulus (SS).
Procedures and conditions for subjective quality assessment are defined in ITU-R BT.500-13 [12].This recommendation defines that a minimum of 15 observers should be used to achieve reliable results.They should be non-experts for the assessment of video quality and their normal work is not experienced assessors.The count of assessors depends on the sensitivity and reliability of the test procedure.Before the start of a testing session assessors should be familiar with many factors, for example the methods of assessment, grading scale, the type of impairments, the timing (duration of training, test and reference sequences, time for voting) and so on [10].
The whole session should not take longer than 30 minutes.Before the first test session there should be 3 to 5 sequences shown to stabilize the opinion of the observer.The order used for the presentation should be random, but the test condition order should be set so that any effects on the grading of fatigue or adaptation are balanced out in all sessions uniformly.To check the coherence there should be some presentations repeated from session to session [9].
After the test session the calculation of Mean Opinion Score (MOS) is done: where u ijkr is the score of assessor i for test condition j, sequence k, repetition r and N stands for a number of accessors.
Finally, the 95 % confidence interval, which is derived from standard deviation and size of each sample is computed.It is given by: where: In our experiments, DSIS and ACR methods were used.

The Double-Stimulus Impairment Scale Method -DSIS
This method consists of pair a pair sequences.The first sequence is unimpaired (the reference) and the second sequence is impaired due to compression (the test).Order of sequences is still the same (Fig. 1) and the assessor is acquainted with this order.Assessors see the reference sequence first, keeping that in mind, the test sequence follows and then the assessor rates the grade of impairment (difference) between the reference and test sequence with a value from five-grade scale, where: • 4 = perceptible, but not annoying, • 3 = slightly annoying, • 2 = annoying, • 1 = very annoying [9], [10] and [13].

The Absolute Category Rating Method -ACR
Unlike the previous method, ACR (also known as Single Stimulus method -SS) consists only of degraded sequences, without a reference sequence (Fig. 2).The assessor is evaluating the level of quality with the value from the five-level grading scale, where: • 4 = good, • 3 = fair, • 2 = poor, • 1 = bad [9], [10] and [13].

Measurements and Experimental Results
In our experiments four types of test sequences with different content were used: • "Beauty" (Fig. 3(a)) -a detail of a female's face, her hair is slowly blowing in the wind on the static black background.
• "Bosphorus" (Fig. 3(b)) -a boat sailing in the Bosphorus strait with a huge bridge in the background, the camera panning from left to rightone object with slow motion.
• "Jockey" (Fig. 3(c)) -a running horse with a rider, the camera panning from left to right -one object with quick motion.
• "ReadySteadyGo" (Fig. 3(d)) -a horserace, horses with jockeys are competing, camera panning from left to right, several objects with quick motion.All sequences were in full HD resolution (1920×1080 pixels), the aspect ratio 16:9 and framerate of 60 fps (frames per second).The length of each sequence was 10 seconds.
Since the compression difficulty is directly related to the spatial and temporal information of a sequence, regarding [13] the Spatial Information (SI) and Temporal Information (TI) of all sequences using the Mitsu tool [14] were calculated.The results are shown in the Tab. 3.According to results the spatial-temporal information plane was drawn (Fig. 4).
The measurement process consists of the following steps: • First, all sequences in uncompressed format (*.yuv) from [15] were downloaded.
• Then, compressed sequences in container *.mp4 and *.mkv were encoded to raw format with *.avi container.
• From all sequences playlists were compiled -the order of sequences with target bitrates for DSIS and ACR can be seen in Tab. 1 and Tab. 2.
• All compression standards were assessed by a group of observers consisting of 30 people (in total 3 groups with 90 observers).Observers watched a playlist on a 42" television, all test sequences in 7 target bitrates and assessed with DSIS method first and afterwards with ACR method (observers didn't know the sequences bitrate order).Finally, the observers assessed the video sequences with a value from the grading scale.
Tab Detail information about observers from evaluation groups 1, 2 and 3 are specified in Tab. 4.
From the assessment tables we computed averages of MOS values for each compression standard in target bitrates 1, 2, 3, 5, 7, 10 and 15 Mbps.According to the results from subjective assessment graphs, which showed average MOS value of compression standards for DSIS and ACR methods, (Fig. 5 In graphs for all measured values 95 % confidence interval were depicted to determine quality saturation (quality threshold).To find out quality saturation value, overlay of lines from 95 % confidence interval were used.This value corresponds to a tradeoff between perceived quality and bitrate; that it is not necessary to increase the bitrate, influence for grow of quality is minimal.Coding efficiency comparison for used compression standards in the same scene and with used DSIS and ACR methods is shown in the graphs (Fig. 8 Generally, the observers evaluated the scene "Bosphorus" as a best one, because in this scene there is not much motion and it contains a big amount of structural changes, which are harder perceived by ob-servers.Vice versa, the worst quality indicates quick motion scenes in both methods.

Conclusion
This paper dealt with evaluating the impact of the
Average MOS of H.264 with DSIS method.
Average MOS of H.264 with ACR method.
Average MOS of H.265 with DSIS method.
Average MOS of H.265 with ACR method.
Average MOS of VP9 with DSIS method.
Average MOS of VP9 with ACR method.
Comparison of coding efficiency for scene Beauty with DSIS.

Fig. 8 :
Fig. 8: Comparison of coding efficiency for scene Beauty with used DSIS and ACR method.
Comparison of coding efficiency for scene Bosphorus with DSIS.
Comparison of coding efficiency for scene Bosphorus with ACR.

Fig. 9 :
Fig. 9: Comparison of coding efficiency for scene Bosphorus with used DSIS and ACR method.
Comparison of coding efficiency for scene Jockey with DSIS.

Fig. 10 :
Fig. 10: Comparison of coding efficiency for scene Jockey with used DSIS and ACR method.
Comparison of coding efficiency for scene ReadyS-teadyGo with DSIS.
Comparison of coding efficiency for scene ReadyS-teadyGo with ACR.
Sequence order with target bitrate for ACR method.
. 1: Sequence order with target bitrate for DSIS method.
Tab. 3: Spatial Index (SI) and Temporal Index (TI) of test sequences.
Mbps bitrate and quality saturation of H.264 in approximately 7 Mbps.This fact leads to the conclusion that there is no need for providers to use higher bitrates in streaming than this threshold, so they can save space in the transmission chain and use it for other channels or services.It follows that both new compression standards (VP9 and H.265) outperformed H.264 and exhibit a higher level of compression, mainly in lower bitrates till 7 Mbps.Over quality threshold exhibits H.264 higher performance than newer compression standards.The reason of this fact should be that H.264 was developed exactly for full HD resolution, vice versa H.265 and VP9 were designed mainly for 4K resolution video.In the near future we plan to extend the analysis of the impact of H.265/HEVC and VP9 compression standards with Ultra HD resolution on video quality using subjective metrics.Engineering, at the University of Zilina in 2008 and 2012, respectively.Nowadays he is an assistant professor at the same department.His research interests include audio and video compression, video quality assessment, TV broadcasting and IP networks.MartinVACULIK was born in 1951.He received his M.Sc.and Ph.D. in Telecommunications at the University of Zilina, Slovakia in 1976 and 1987 respectively.In 2001 he was habilitated as associate professor of the Faculty of Electrical Engineering at the University of Zilina in the field of Telecommunications.Currently he works as a head of Department of Telecommunications and Multimedia.His interests cover switching and access networks, communication network architecture, audio and video applications.Tomas MIZDOS was born in 1993 in Poprad, Slovakia.He received his B.Sc. degrees in Multimedia technologies at the Department of Telecommunications and Multimedia, Faculty of Electrical Engineering, at the University of Zilina in 2015.Nowadays he is M.Sc.student at the same department.His main area of interest is functionality and quality of multimedia services.
H.264/AVC, H.265/HEVC and VP9 compression standards on the perceived video quality using selected subjective metrics.The target of this paper was to research how non-expert observers perceived and evaluated the video quality affected by the bitrate.The evaluation was done for four types of Full HD sequences with different content.From the graphs we should state that the threshold of the perceived quality of the H.265 a VP9 compression standards is close to 5