Impact of GoP on the Video Quality of VP 9 Compression Standard for Full HD Resolution

In the last years, the interest on multimedia services has significantly increased. This leads to requirements for quality assessment, especially in video domain. Compression together with the transmission link imperfection are two main factors that influence the quality. This paper deals with the assessment of the Group of Pictures (GoP) impact on the video quality of VP9 compression standard. The evaluation was done using selected objective and subjective methods for two types of Full HD sequences depending on content. These results are part of a new model that is still being created and will be used for predicting the video quality in networks based on IP.


Introduction
Interest on new multimedia services has significantly raised in the last years.This increase goes hand in hand with demand for higher TV resolutions and bandwidth which leads to need to develop new compression standards.Nowadays, many new codecs have become available as VP9 or H.265/HEVC and other are being developed as DAALA or VP10.It is well known that compression together with transmission link imperfection are two major factors that influence the video quality.Because of that fact the video quality assessment still plays an important role of the research.This paper deals with the assessment of the Group of Pictures (GoP) impact on the video quality of VP9 compression standard using Full HD resolution.The rest of the paper is divided as follows.In the next part, the state of art is written.The third part shortly describes the VP9 compression standard.In the fourth and fifth part the objective and subjective methods are described.
The sixth part deals with the measurements and the seventh part with the results obtained from these measurements.

State of the Art
Recently, many studies and publications deal with exploring the video quality affected by the VP9 codec.Some of them explore [1], [2] and [3] the quality of multimedia services, others focus on objective testing [4], [5], [6], [7], [8] and [9] as well as on subjective tests [10] and [11], but not many deal with the comparison of the quality between sequences using GoP and without GoP.This paper focuses on the video quality evaluation of one of the newest compression standard -VP9 in terms of use GoP.The testing is done for two sequences depending on content for Full HD resolution.

VP9 Compression Standard
VP9 is one of the newest video compression standards.It has been developed by Google and has become available in June 2013.VP9 is a successor to VP8.The aim for VP9 includes reducing the bit rate by 50 % compared to VP8 while maintaining the same video quality.VP9 has many design improvements compared to VP8.It supports the use of superblocks of 64 × 64 pixels and a quadtree coding structure could be used with the superblocks.Some web browsers as Chromium, Chrome, Firefox, and Opera support playing VP9 video format in the HTML5 video tag.Its own successor, VP10, is being developed [12].

Objective Video Quality Assessment
Objective video quality assessment is a type of measurement where the evaluation using computational methods called "metrics" is done which produce values that score the video quality.They mostly measure the physical characteristics of a video signal.They are used very often because of its repeatability and simplicity of the calculation.Many objective metrics exist but the well-known and mostly used are Peak Signalto-Noise Ratio (PSNR), Video Quality Metric (VQM) and Structural Similarity Index (SSIM).PSNR is the oldest but still very used metric.It is very fast and easy to compute [13].The SSIM metric measures three parameters -the luminance similarity, the contrast similarity and the structural similarity and merges them into one value, which determines the quality.This final value is in the range from 0 to 1 where 0 stands for the worst and 1 for the best quality [14].The VQM metric computes the visibility of artifacts expressed in the DCT domain.The final value of the VQM metric designates the amount of video distortion -for no impairment the value equals to zero and for increasing amount of impairment the output value rises, too [15] and [16].All mentioned metrics can be included in socalled Full Reference (FR) metrics, which means that for ability to compute the video quality the reference sequence needs to be known.

Subjective Video Quality Assessment
The subjective assessment is a type of measurement where people are used to score the video quality.It is the most reliable and fundamental way how to determine the video quality called Quality of Experience.It involves visual psychological tests where human evaluators are subjected to a video stimulus and evaluate its quality based on their own subjective judgment.This type of assessment has one drawback -it is very timeconsuming method and for proper assessment many people are needed.The well-known and mostly used subjective methods described in [17] and [18]  • whether the observers assess the quality during or after presentation, • whether the reference sequence is hidden (No Reference methods) or not (Full Reference methods).
According to [17], minimum 15 observers should be used in an assessment to achieve valid results.Of course, the number of the observers needed for the tests depends upon the sensitivity and the reliability of the test procedure adopted and upon the anticipated size of the effect sought.The whole presentation structure of each test, which should not exceed 30 minutes, is shown in the Fig. 1.Before the test session, assessors should be introduced to many factors, as for instance the method of assessment, the types of impairments, the grading scale, the sequence, the timing (the reference, the test sequence time duration, the time duration for voting) and so on.After the test session, the calculation of the mean score ( MOS) using this formula is done: where u ijkrs is the score of observer i for test condition j, sequence k, repetition r and N stands for number of observers.Also, the 95 % confidence interval, which is derived from the standard deviation, and size of each sample is calculated.It is given by [17] and [18]: where: Due to the assessment of short sequences (10 sec.) after the presentation in our testing, DSIS, DSCQS and ACR methods were used.

Measurements
In our measurements two types of assessment were done: • objective assessment using PSNR, SSIM and VQM metrics, • subjective assessment using DSIS, DSCQS and ACR methods.

Source Signal
In our testing two types of test Source Sequences (SRCs) depending on content were used: • one with dynamic scene called "Basketball" (Fig. 2(a)), • one with slow motion called "Cactus" (Fig. 2(b)).Since the compression difficulty is directly related to the spatial and temporal information of a sequence, regarding to [18], the Spatial (SI) and the Temporal Information (TI) of both sequences using the Mitsu tool [20]

Coding
Both test sequences were encoded to the VP9 compression standard with two different GoP setting: • without GoP setting (N = 250), which means the distance between two successive I frames was 250 frames (249 P frames between two successive I frames were used), • the GoP was set to 12 (N = 12), which means the distance between two successive I frames was 12 frames (11 P frames between two successive I frames were used).
Since the VP9 compression standards does not use B frames, only P frames between two I frames were used.
The coding process was done using the FFmpeg tool [21].The command line settings of this tool for the VP9 compression standard is shown in the Tab. 4. The bitrates were in the range from 1 to 10 Mbps with a step of 1 Mbps, which means 20 Hypothetical Reference Circuits (HRCs) were used -for each SRC ten HRCs restricted by maximum bitrate.It is important to mention that for the subjective assessment only five HRCs for each SRC were used 1, 3, 5, 7, 9 Mbps.If all HRCs were used, it would be too difficult for observers to recognize the video quality between two successive sequences.It was also necessary to take into account the maximum duration of test session, which should not last more than 30 minutes.The selected sequences were viewed by the observers in the random order.Afterwards, both sequences using the same FFmpeg tool back to the format *.yuv were decoded.

Evaluation
Finally, the video quality was evaluated.
• For the objective assessment the MSU Measuring Tool Pro version 3.0 was used [22].PSNR, SSIM and VQM objective metrics for the measurements were used.
• For the subjective assessment the observers -people, who watched the sequences and assessed the video quality, were used.The DSIS, DSCQS and ACR methods were used.
In our experiments, 30 assessors (19 men and 11 women) in the range from 20 to 26 years were used.The average age was 22 years.Most of them were students of our department.
The whole process of the measurement and evaluation is shown in the Fig. 3. Fig. 3: The process of measuring and evaluating the impact of GoP of the VP9 compression standard on the video quality.According to the graphs, the quality raises logarithmically with increasing bitrate.What is important for us is the difference of the quality between the sequences with and without GoP setting.As seen from the plots, the quality of the sequences without GoP setting reach better quality than the sequences with typically GoP setting.For better representation, the difference be-     According to the graphs the same as in case of the objective assessment can be said -the quality increases logarithmically with increasing bitrate -in low bitrates the quality grows swifter than in high bitrates, which means the degradation influenced by compression is  more visible in low bitrates than in the high ones.This fact also recognized the observers.Regarding the quality between the sequences with and without GoP setting, it can be said that the people saw the difference between these two types of sequences.The observers rated the quality of sequences without GoP setting with higher marks than the sequences with GoP setting.The  Afterwards the Pearson correlation coefficients of the differences between all objective and subjective methods for both test sequences were calculated.It was done using the formula:

Experimental Resuts
where k xy is the covariance and d x and d y are the standard deviations of the two variables.The correlation coefficients for both test sequences are reported below in the Tab. 7.
Tab. 7: The correlation coefficients of the differences between all objective and subjective methods for both test quences.According to the Tab. 7, very high correlation is between the ACR method and PSNR as well as the SSIM metrics by the Basketball sequence and between the ACR method and all objective metrics by the Cactus sequence.It follows that the results obtained from the ACR subjective assessment mapped well the results obtained by objective evaluation and that the ACR subjective method should be used in the future research.

Conclusion
This paper dealt with the assessment of the Group of Pictures (GoP) impact on the video quality of the VP9 compression standard.The aim of this paper was to research the difference in the video quality between sequences with and without GoP setting and to find out the correlation of the differences between all used methods.The assessment was done using selected objective and subjective methods for two types of Full HD sequences depending on the content.The results showed that the sequences without GoP setting reach better quality than the sequences with typically GoP setting, especially in low bitrates.Afterwards, the correlation of the differences of all objective and subjective methods for both test sequences was calculated.According to the results, it can be said that very high correlation is between the ACR method and PSNR as well as the SSIM metrics by the Basketball sequence and between the ACR method and all objective metrics by the Cactus sequence.All results are part of a new model that is still being created and will be used for predicting the video quality in networks based on IP.

Fig. 1 :
Fig. 1: The presentation structure of the subjective test session.

Fig. 2 :
Fig. 2: The test sequences.Both sequences were downloaded from [19] in the uncompressed format (*.yuv) and used as the reference ones.The basic parameters of these sequences are shown in the Tab. 2.

Figure 4
Figure 4 shows the relationship between the video quality assessed by the objective metrics and the bitrate.The curves represent the test sequences with and without GoP.In this figure, three graphs are inset -depending on used objective metric Fig. 4(a), Fig. 4(b) and Fig. 4(c).

Fig. 4 :
Fig. 4: The relationship between the video quality measured by the objective metrics and the bitrate.The curves represent the test sequences with and without GoP.

Fig. 5 :
Fig. 5: The quality difference (in percentage) between the sequences with and without GoP setting measured by the objective metrics.

Figure 6
Figure 6 shows the relationship between the video quality assessed by the subjective methods and the bitrate.The curves represent the test sequences with and without GoP.In this figure also three graphs are inset -depending on used subjective methods Fig. 6(a), Fig. 6(b) and Fig. 6(c).In the Tab.6, the values represent the quality difference expressed in percentage between the sequences with and without GoP setting measured by the subjective methods are shown.The same values are shown in the Fig. 7.

Fig. 6 :
Fig. 6: The relationship between the video quality measured by the subjective methods and the bitrate.The curves represent the test sequences with and without GoP.

Fig. 7 :
Fig. 7: The quality difference (in percentage) between the sequences with and without GoP setting measured by the subjective methods.
Tab. 2: Basic parameters of the test sequences.
were calculated.The results are shown in the Tab. 3.
Tab. 4: Command line settings of the FFmpeg tool for the VP9 compression standard.
table, as well as the figure, shows that the difference in quality between the two mentioned sequences is biggest in low bitrates and with increasing bitrate the difference decreases.Subsequently, the same measurements but using subjective methods were done.
table, as well as the figure, shows that the difference in quality between the two mentioned sequences is by all metrics biggest in low bitrate -by 1 Mbps.By the others bitrates by the DSIS and DSCQS methods the difference values are quite similar -they move between 4.63 % and 14.71 %.Only by the DSCQS method the difference values are quite different.It can be caused by the type of this method, where the observers do not know which one was the reference and which one the test sequence, so they could rate test sequences with higher marks than the reference ones.The quality difference (in percentage) between the sequences with and without GoP setting by the subjective methods.
Faculty of Electrical Engineering, at the University of Zilina in 2008 and 2012, respectively.Nowadays he is an assistant professor at the same department.His research interests include audio and video compression, video quality assessment, TV broadcasting and IP networks.Juraj BIENIK was born in 1987 in Zilina, Slovakia.He received his M.Sc. in Telecommunications at the Department of Telecommunications and Multimedia, Faculty of Electrical Engineering, at the University of Zilina in 2012.Nowadays he is a Ph.D. student at the same department.His research interests include audio and video signal processing, functionality and optimalisation of networks and video quality assessment.Martin VACULIK was born in 1951.He received his M.Sc.and Ph.D. in Telecommunications at the University of Zilina, Slovakia in 1976 and 1987 respectively.In 2001 he was habilitated as associate professor of the Faculty of Electrical Engineering at the University of Zilina in the field of Telecommunications.Currently he works as a head of Department of Telecommunications and Multimedia.His interests cover switching and access networks, communication network architecture, audio and video applications.