Impact of Video Content and Technical Specifications on Subjective Quality Assessment

Tradicinių analoginių televizijos transliacijų pakeitimas naujų  pažangių skaitmeninių technologijų transliacijomis ne visada užtikrina aukstą vaizdo kokybe. Siame straipsnyje pateikiama techninių specifikacijų poveikio vaizdo turiniui  analizė. Dalyvaujant 25 stebėtojams atlikti eksperimentiniai matavimai su 11 vaizdo scenų, sudarytų pagal įvairias technines specifikacijas su įvairiomis analoginėmis ir skaitmeninėmis sąsajomis. Rezultatai rodo, kad didelės raiskos analoginės ir skaitmeninės vaizdo sąsajos stebėtojų yra vertinamos beveik kaip tokios pacios kokybės, bet dinamiskai kintancio vaizdo scenose ekspertiniai kokybės vertinimo rezultatai issiskyrė. Didelės dinamikos vaizdo scenose kokybės vertinimo rezultatai, naudojant standartinės raiskos arba didelės raiskos, analogines ar skaitmenines sąsajas, skiriasi nereiksmingai. Il. 3, bibl. 20, lent. 5 (anglų kalba; santraukos anglų ir lietuvių k.). DOI: http://dx.doi.org/10.5755/j01.eee.122.6.1828


Introduction
Since digital video technology has been replacing analog technology a variety of technical standards (visual display formats, video compression ratio levels and transmission technology) has increased. A great number of choices of technical specifications with different video content quality are possible. Technical improvements and development of new technology not always matches with a high level of subjective assessment of video quality. Most of the studies [1,2] doing subjective video quality measurement are using the standard video scenes, such as the girl with a handset-Susie, the woman's face-Barbara, nature views, urban scenes with moving and non-moving elements-Trees, Mountains, River, Waterfall, Tower, City, etc. These video scenes represent the most trivial television broadcasts, such as television serials, news, etc. In most cases, these are static video scenes, although nowadays is rapidly growing use of video scenes, which are very dynamic, mostly synthetic video scenes and objects. Such are computer made movies saturated with special graphical effects, as in the movies "Avatar", "The Lord of the Rings 3" and real time streaming video games.
Unfortunately, most of the studies doing video quality testing were using video scenes without evaluating their content related technical specifications as, for example, spatial activity (SA) and temporal activity (TA), as well as the fact that measurements should preferably use video fragments with a wide range of SA and TA values [3]. These features it may have a very significant impact on the observer's evaluation and may depend also on the specific technical parameters of device under test, for example, video decoding, transmitting through the video interfaces, displaying on the screen, as well as very many other technical specification. SA and TA are very important and very much related to the level of video compression and level of impairment it exhibits when the video is transmitted over various digital transmission service channels: i-TV, IPTV (Internet Protocol oriented TV), DVB-T/S/C/H and T2/S2/C2/SH (Terrestrial, Satellite, Cable and Handheld) [3][4][5].
Choosing a test video scene for measurement and choosing video decompression ratio while performing quality testing, in the most of the studies [3,4,6,7] standard video scenes were selected not taking into account the specifics of the selected video scene content. With choosing the appropriate content it is possible to highlight the degradation of video signal by specific artifacts, for example, blocking, blurring, color bleeding, ringing, staircase effect, mosaic effect, mosquito noise, false contouring, false edge, flickering, fluctuation, ghosting, jerkiness, motion compensation mismatch, stationary area fluctuation, smearing and video scaling, field rate conversion and de-interlacing [1][2][3][4][6][7][8][9][10].
The main aim of this study is to evaluate impact of the video content and technical specifications on the observer's subjective assessment of video quality through the measurements in experimental setup and statistical analysis of results. Video quality measurements were performed using a variety of video interfaces and selected specific video content with wide range of technical specification.
Rest of the paper is organized as follows. Second section covers a brief overview of subjective video quality testing methods and techniques. In the third section there is described the video quality measurement Set Up and procedure, which consists of four parts: laboratory Set Up, video scenes and their technical specifications, experiments procedure and observers and procedure. The last part of the paper consists of there parts: processing and analysis of test results, statistical data processing of measurement results, analysis of results and finally  [3,4,14]. During the measurements, the test sequence was carefully chosen, to minimize the impact of video content on video quality assessment [14].
Experiment procedure. Video quality assessment tests were performed in two test groups: DS -Double Stimulus (2 tests) and SS -Single Stimulus (3 tests). So experiment contained in total five tests with total of 58 (SS tests 3x12 + DS tests 2x11) video picture quality votes for each observer. Technical specifications of test procedures are presented in Table 3.
DS test 1 sequence length: 10.22 minutes, DS test 2 10.22 minutes, SS test 1 11.43 minutes, SS test 2 11.43 minutes, SS test 3 11.43 minutes. Each video test clip length was between 37-73 seconds (see Table 2). The total testing time taking into account the pauses between test groups is 60 minutes.
After each video clip and after each test group video sequence had a pause 5-7sec., during which system was switched to play the next clip or test group.
Each test group consists of 11-12 video clip fragments in random sequence with different technical parameters and video interfaces between STB and TV.
According to the Table 3 subjective video quality measurement results are shown in the following [20]. Observers and procedure. A total of 25 observers (non-experts) of different gender and age, education level, with or without glasses participated in experiment [20]. Observers represent the general public people, the average user and someone who will consume the product, therefore treated as non-experts. Non-experts will generally provide excellent data when the tests are structured properly [3,4,8].
From 25 observers 28% were women and 72% men, 28% were using glasses and lenses daily. Observers all had higher education. From them 48% were from natural sciences and engineering, 28% -from social sciences and 24% -from humanities. The distribution of observer age is -90% between 20-30 years old and 10% between 30-50 years old.
For DS tests the TV screen was divided into two parts (left part of the screen -L and right of the screen -R), where each section was displaying a video featuring two sequences. Comparison of the two video sequences was assessed (putting "X") in which the visual video image quality is better, worse or the same as the other video image quality using the rating scale described below.
If the video picture quality of both (left and right part of the screen) is the same, observer was choosing the value of "0". If the left screen of the video quality is slightly better, then choosing -"1" on the left. If the right part of the screen quality is much better than the left, then choosing -"3" on the right.
For Double Stimulus score: 0-both of the same video quality, 1-one video quality is slightly better than the other, 2-one video quality is better than the other, 3-one video quality is much better than the other.
In performing the SS tests on the TV screen was displayed the full size video. For each video, observer had to make assessment (choosing the appropriate mark) of image quality using the rating scale described below.
If the video picture quality is good, then mark is "4". If the video picture quality is poor, then -"2".
Please see the DS and SS test voting examples on a public website [20].

Processing and analysis of test results
As has been already described in the previous section, there were both DS and SS measurements performed, but due to the large amount of data in this work we were focusing only on the analysis of SS measurements. The DS measurement results are out of scope of this paper and left to other research.
The selection of video scenes is very important and was described in previous sections. The spatial activity (SA) and temporal activity (TA) are critical parameters [3,6,15,16], as well as the content of the video [14]. There exist different methods on how to measure a spatial and temporal activity video. In this paper there has been used the methods recommended in ITU-T Rec. P.910.
The spatial activity is a measure of the existence of fine structures in the picture. Picture rich in detail, i.e. one with many fine structures, exhibits high spatial activity. The measure of the spatial detail in an video frame by representing the distribution of texture in the frame assuming a Gaussian spread and thus is similar to the entropy measurement of pixel intensity by taking higher values for the more informative scenes [20].
Measurement of spatial activity could be also based on Sobel filtering. Each video frame (Y luminance plane) at time n (F n ) is filtered with Sobel filter [Sobel(F n )] for edge detection. The standard deviation over pixels (std space ) in each filtered frame is computed. This operation is repeated for each frame in the video sequence and results of SA of video scene. The maximum value in the time series is chosen to represent the spatial information content of scene. Spatial activity (SA) for each video sequence is calculated as follows = . (1) The temporal activity (TA) is based upon the motion difference feature. More motion in adjacent frames will result in higher values of TA. If frames are identical then TA will be zero.
TA could be measured by determining difference parameter M n (i, j) between successive frames. M n (i, j) as a function of time (n) is defined as here F n (i, j) is n-th frame in t TA is cal Tempora calculated for the Fig. 2 and   Table 5 presents mean values and standard deviations of HD analog SS-1 (YPbPr), HD digital SS-2 (YCbCr) and SD analog SS-3 (YIQ) video scenes. The measurement data shows that using the high definition analog YPbPr and digital YCbCr interface for video scene there are no significant differences in subjective video quality assessment for almost all scenes and it is also confirmed by the calculated correlation coefficient of 0.91. Both interfaces provide high quality, but large dispersion of scores across video scenes and it indicates, that the video content and technical specifications play a crucial role in video quality assessment.
The worst score corresponds to the video LCG, SP and ST. Most of the video scenes evaluated with the worst score, especially ST and SP video scene, have the highest temporal activity values. By contrast, SC has one of the highest quality scores and it has the smallest temporal and spatial activity value of all tested video scenes. This reverse relationship has been also confirmed by correlation coefficients between the TA value and MOS values for YPbPr (-0.39) and YCbCr (-0.48).
From this analysis we can conclude that for both, analog and digital interfaces, the quality score for video scene is worse on average with increasing TA.
For SC, CTP, ColP and ST video scenes quality ratings are virtually the same for a standard-definition analog YIQ interface, a high-resolution analog YPbPr interface and digital YCbCr interface. At the same time for FC, ConP, SP and LCG video scenes quality score for YIQ interface is significantly higher. Most of the video scenes (especially ST and SP), which are displayed with blurring flickering effect while using high-resolution analog or digital YUV so does not single out defective image regions with the standard quality YIQ interface, but defects blend into the background of the overall picture.
By contrast, the two video scenes (MIS and FO) received a remarkable higher quality score with analog YPbPr interface and digital YCbCr interface compared to analog YIQ. Both of these video scenes are slowly changing images of high-resolution real-life elements (colorful autumn leaf on the ground, slowly flying and rotating fruits and vegetables).
Based on the above one can conclude that MIS and FO video scenes are most suitable for high definition YUV interface, while the SC, CTP, ColP and ST video scenes are equally well suited for both high and standard definition interfaces, while FC, ConP, SP and LCG video scenes would be more suitable for standard definition YIQ interface.
All other DS and SS measurement tables, graphs data and for testing used video scenes screenshot are publicity available [20].

Conclusions
This paper presents the results and analysis of the series of specially designed experiments performed with the aim to study the impact of video content and technical specifications on subjective quality assessment. The focus was on revealing the relation between technical specifications of video scenes and interfaces on subjective quality assessment made by observers influenced by appearance of different types of common artifacts in digital videos (blocking, blurring, flickering, etc.).
The results showed that the both high definition analog and the digital YUV video interfaces are evaluated by observers as being of nearly the same quality. The subjective video quality scores are correlated.
Analysis also revealed that temporal activity of video scenes had a significant effect on the subjective quality assessment score. Those videos having high value of temporal activity parameter in average were evaluated by observers with lower quality score. At the same time, spatial activity parameter of the video scene did not show any correlation with the subjective quality score.
For a very vibrant synthetic video scenes, with high values of TA and SA (especially with high TA values) it was experimentally observed that there is no significant difference in subjective quality assessment score either using standard definition analog YIQ or high definition analog or digital YUV, as in both cases well observable impairments are caused.
Further studies are planned to analyze the results obtained in DS tests. Please see the DS testing results on a public website [20].