Objective Models for Performance Comparison of Compression Algorithms for 3 DTV

Efficient video compression algorithms in advancedmultimedia broadcasting systems are in high demand. In the last decades, different video compression tools have been developed which can influence the final Quality of Experience in different ways. his paper has two goals. The first goal is to present a study of different compression algorithms available for stereoscopic 3D videos. The second goal is to present the possibilities in the creation of new stereoscopic models. The well-established video codecs (AVC, MVC, HEVC and MV-HEVC) are considered as encoders. Generic objective video quality metrics are used to analyze the compression efficiencies of the considered codecs, extended with results from subjective tests. The correlations between the objective and subjective scores are analyzed statistically. Due to unsatisfactory results of generic 2D metrics for the stereoscopic sequences used in the test, new objective models are presented. Such models show improved correlation with subjective stereoscopic video quality. The validation, verification and a description of models are presented in detail.


Introduction
Nowadays, interest in excellent video quality is rapidly increasing.Such interest is closely related to the provided video services in Standard and High Definition (SD and HD) formats and in the future in Ultra HD (UHD) or Threedimensions (3D).UHD and 3D are emerging video formats with specific features.It is evident that flexible and highly efficient video coding algorithms are very important to distribute video content in such formats and in a required quality [1].As an example, we can state the scenario where we are strongly limited by transmission data rate, which is often the case in wireless networks.Furthermore, we may be limited in the maximum amount of transferred data, so-called Fair User Policy (FUP), in mobile networks [2].
Today's display units already reach technical properties that are close to the resolution limits of the human eye.Enhanced display qualities, such as Ultra HD, High Dynamic Range (HDR), High Frame Rate (HFR), and wide color gamut are already approaching the limit of Human Visual System (HVS) in terms of user experience for 2D video.An appropriate approach to represent a real 3D view can be one of the next research directions.For instance, holographic displays and volumetric displays.
Video compression tools play a key role in the fulfillment of both multimedia content provider's requirements (e.g.bandwidth needed for transmission) and users' requirements (transparent video quality).When using any compression tool, it is important to find a balance between the compression ratio and user's Quality of Experience (QoE).A high compression ratio can significantly reduce the amount of data in the processed video but results in high degradation of video quality.The assessment of such a degradation is especially important for 3D visual content which has been receiving attention in many fields of interest (e.g.TV broadcasting, security, medicine).Consequently, accurate evaluation of stereoscopic 3D video quality by objective and subjective metrics is highly required [3].Despite the fact that today's interest of stereoscopic television seems to be out of date, new publications that deal with this issue keep appearing -as an example, we can refer to publication [4].
The paper aim is to explore the performance of recent and emerging compression tools for 3D stereoscopic video, namely H.264 Advanced Video Coding (AVC) [5], H.264 Annex H -Multiview Video Coding (MVC), H.265 High Efficiency Video Coding (HEVC) [6] and H.265 Annex G -Multiview High Efficiency Video Coding (MV-HEVC) [7].For this purpose, appropriate subjective test sessions have been realized.Well established 2D objective video quality metrics are then compared with scores from the subjective test.Moreover, gathered results are statistically analysed.Based on subjective test results and commonly used 2D metrics, models of 3D stereoscopic metrics are developed to best describe the quality of stereoscopic videos.Our general development of objective models can also be applied to non-3D video types, such as UHD, etc.The results presented in this paper are a continuation of our earlier work published in [8].
The rest of this paper is organized as follows.The related state-of-the-art and the main contributions of this research paper are described in Sec. 2. The test setup is described in Sec. 3, including the used subjective video quality method, its setup and the whole realization.Section 4 contains the results of the objective metrics and subjective test and their further evaluation and discussion.Section 5 describes the proposal and verification of our models for stereoscopic video.Finally, conclusion remarks are outlined in Sec. 6.

Related Work
There are several possibilities how to encode stereoscopic video content.Each view of the stereo pair can be encoded separately as an independent video sequence using common video coding algorithms for 2D video sequences.Another possibility is to use video coding algorithms specifically designed to support multiple views.These algorithms usually consider the similarity of both views which can lead to significant bitrate savings.Also, specialized video coding algorithms for 3D exist which can take advantage of depth maps if present.The following paragraphs relate to previously published works regarding video coding of 3D content for multimedia purposes and related Quality of Experience.
In recent years, numerous studies have focused on exploring the possibilities of encoding stereoscopic 3D video content.Hannuksela et al. [9] offer an extensive overview of the MultiView extension of the High Efficiency Video Coding standard.MV-HEVC is capable of encoding multiple views together without using a depth map and is also able to encode stereoscopic 3D video.The overview of the 3D extension of HEVC (3D-HEVC) is presented in [10].As 3D-HEVC is designed for encoding 3D content, it utilizes both the stereo pair and the information from the depth map and camera configuration.Results of software evaluations suggest that it is possible to achieve about 52 % coding efficiency gain on average when using 3D-HEVC compared to standard MVC.A special case is described in [11], where an extension of 3D-HEVC considering circular camera arrangement is proposed.

Objective Metrics and Models in Stereoscopic 3DTV
Possibilities of using common 2D objective metrics for stereoscopic video were examined in [12] and [13].In the first paper, the impact of encoding artefacts in stereoscopic video quality has been evaluated with three 2D objective metrics.The evaluation was done using Peak Signal-to-Noise Ratio (PSNR), Structural Similarity (SSIM) and Visual Information Fidelity-pixel domain (VIFp).The results show that only the VIFp results were highly correlated with subjective data among selected objective metrics.The paper [14] investigates the reliability of objective quality metrics commonly used for the quality assessment of 2D media, in the context of stereoscopic 3D Video.The consistency between objective and subjective measures is evaluated by the Pearson linear correlation coefficients (PLCC).In [15], the use of 2D objective metrics for 3D quality assessment has been explored.Two objective metrics, Video Quality Metric (VQM) and Perceptual Quality Metric (PQM), have been investigated and their alignment to the Mean Opinion Score (MOS) has been analyzed.In that paper, unlike ours, the video sequences were encoded only by using the AVC encoder.Based on the statistical Pearson correlation (PCC) analysis, PQM correlates better with MOS than VQM, 0.78 versus 0.97.Results also indicated that the correlation is strongly content dependent.In another work, Han et al. [16] proposed an extended no-reference objective stereoscopic 3D Video Quality Metric (eNVQM) for 3D video quality assessment.Performance of eNVQM was studied in comparison with two 2D objective video quality metrics, SSIM and VQM.The PCC analysis showed that eNVQM has better accuracy, PCC equal to 0.944, in terms of human perception for stereoscopic video, compared to two current common assessment methods.Pearson correlation for SSIM was 0.911 and 0.932 for VQM.

Subjective Assessment in Stereoscopic 3DTV
The authors of [17] analyzed the possible use of Absolute Category Rating (ACR) for 3D stereoscopic content.A study of subjective quality of monoscopic and stereoscopic video in adaptive streaming in [18] presents a comparative analysis of different bitrate adaptation strategies in adaptive streaming in 2D and 3D scenarios.We can observe that if the experiment was done on monoscopic video content then no statistical differences were found when changing the bitrate in an abrupt or a gradual way.Also, high quality oscillations were hardly perceptible if there is not so much coding bitrate difference.Tests on stereoscopic video confirms that switching from 3D to 2D could be the best option to reduce the bitrate, while the inverse behavior does not provide a significant improvement to QoE.Paper [19] studied the response of the HVS to compressed stereoscopic sequences and compared the visibility of artifacts in 3D and 2D views (individually left and right eye views) over a different range of bitrates.The 2D and 3D MOS from the test showed that there is a bitrate threshold above which compression artifacts tend to be suppressed in the 3D view when compared to the classic 2D view.The correlation between objective metrics and subjective tests is highly depending on the features of the used video sequences.It is therefore appropriate to perform extensive tests with different methods (subjective), codecs and videos.
Based on the brief state-of-the-art presented above, our paper tries to answer the following points: 1) Which 2D objective metrics correlate best with the users Mean Opinion Score (MOS) for stereoscopic 3D videos, based on additional statistical analyzes?
2) Can any 2D objective metrics be optimized for better 3D accuracy?Alternatively, can a model be created that has a better correlation with stereoscopic 3D video sequences?

Experimental Setup
This section briefly describes the setup of our experiment.The video sequences used for the test, parameters of encoders and used objective metrics for rating the quality of video sequences are outlined.

Video Sequences
As the source of stereoscopic 3D video sequences, we have used four samples which are available in databases [20] and [21], to make our research have a wider range of uses.hese videos are used as an original dataset for our test.The additional four video sequences were used at another subjective test for the verification of the proposed objective models [21].All these sequences were in Full HD resolution (1080p) for each view and had a frame rate of 25 frames per second (fps).The length of each video sequence before encoding was adjusted to 10 seconds, which is a typical length used in subjective video quality studies.The selected video sequences cover a wide variety of contents as can be seen from the Spatial Information (SI) and Temporal Information (TI) in Fig. 1.The figure also contains one frame of each corresponding sequence.Both parameters SI and TI were calculated according to ITU-T P.910 [22].The average value of depth for 5, 50 and 95 % for each video sequence was calculated, as can be seen in Tab. 1.According to the obtained results, for instance, in the case of video Train, 95 % of pixels will have a depth (shift of pixels) of 15.27.In other case, video Basketball will have 5 % of the pixels with a depth of −10.59.For the calculation of the average value of depth, software StereoPhoto Maker was used1.The average value of the depth of the videos varies considerably.The content of the video sequences can be described as follows:

Image depth Videosequence [%] Basketball Poznan Train Wishing
• Basketball: Basketball players playing in the street.
A moving camera with a wide shot.Fast and unpredictable movements.Different physics including jumping players and ball in the air.
• PoznanHall: A view into a school corridor with a slightly moving camera and a walking man in the foreground.Slow and predictable movements.
• Train: Static view of a train station with an approaching train with a detailed view of the overhead wire.Slow motion video and predictable movement of the train.
• WishingWell: Fountain with coins on the bottom and moving leaves on the surface.A lot of small waves on the water and reflections from the water.

Encoding Parameters
As input to encoders, only bitrates were defined together with searching motion range without any other system parameter modification and without any tuning of the encoders.The quality profile was set to the highest quality because we were focusing only on the quality of encoding, not the encoding speed.Encoders selected other parameters automatically by itself.A summary of video encoders settings used in encoding is provided in Tab. 2. Target bitrates were adjusted between 0.5 Mbps and 4 Mbps.The target bitrate applies to one view only.Let us give an example for the 1 Mbps bitrate: For the 2D encoder, the total bitrate 1 + 1 Mbps was set.For multiview encoders, the stream of both views was 2 Mbps.This means that the total data rate is the same.The searching range for HEVC-based encoders was set to 64 pixels to take full advantage of these modern encoders.

Objective Video Quality Metrics
A specific feature of stereoscopic 3D videos is a broad variety of imaging technologies available.They have a different structure of image data and different types of compression standards.Currently, there is no widespread general objective 3D metric.Therefore, widely established general metrics like PSNR, SSIM, and VQM are commonly used.Also more advanced metrics for exist a particular type of content or compression that achieve better results for their specific area of use.However, the intention of this contribution is to create a general method for stereoscopic content.The PSNR is a very simple metric based on differences of the corresponding pixel values [23].The value of PSNR was computed for the luminance component only.The SSIM computes the structural differences in the pictures reflecting basic properties of the HVS [23].Finally, VQM compares the original characteristics with the processed characteristics of the video sequences and then it produces VQM scores.The range can be from 0 (no perceived deterioration) to approximately 1 (maximum perceived deterioration).A general model was used for our case [24].

Subjective Test Setup
All subjective video quality assessment was conducted in a special test room.Laboratory conditions were set up according to ITU-R BT.500-13 [25] including a room with controlled lighting.For the subjective experiment, a plasma stereoscopic television (Panasonic TX-P42GT20E) was used to display stereoscopic 3D video content.The television's active shutter 3D system with a Full HD double frame rate was used.In contrast to polarization 3D system, this method of 3D view does not reduce the resolution.It is its biggest advantage.The video format structure of the Frame Packing 3D was used.It conveys to two "full resolution" 1080p video signals, one for each eye, to the TV.This method is marked as Full High Definition 3D (FHD3D).The interface used between TV and PC was HDMI 1.4 which is capable to successfully transfer FHD3D.The peak luminance of the display was adjusted to 200 cd/m 2 .The viewing distance of the participants from the display, according to ITU-R BT.2022 [26], is the height of the picture multiplied by 3.2.In our case, the optimal viewing distance is 1.7 meters (see Fig. 2).In the subjective test, only one participant was in front of the television to eliminate the effect of different observation positions.
As the pretest 3D sequences in three qualities were played.These sequence were different from the sequences used during the test.Observers had an overview of how the 3D movie could look.Sequences were randomly played for each participant who did not know about the details (the used encoder or bit rate).The observers were asked for evaluating the quality of the played video using a simple five point discrete scale in the range from 1 (Bad) to 5 (Excellent).Whole tested sequences were evaluated by all participants for the best consistency of results.Participants rated the quality on sliders which were connected to the master computer (see Fig. 2).This computer also controls the media computer from which the video sequences were played.

Participants and Color Vision Test
A total group of 37 observers participated in the stereoscopic 3D subjective test.Two of them were female.The youngest participant was twenty years old and the oldest one was forty-two years old.Overall, eight of them had some experience with video quality assessment.Next, five observes previously participated in stereoscopic 3D video quality assessment tests.University students and employees were recruited with an average age of 24.The youngest participant was 20 years old and the oldest was 42 years old.Color blindness (Ishihara test) of all participants was tested as well as their ability of stereoscopic vision (Randot stereotest) [27].Three people who did not pass the tests were not included in the final evaluation.

Experimental Results
The results obtained from the objective metrics, subjective test and additional correlation tests are evaluated, compared and discussed in this section.

Coding Efficiency According to the Objective Metrics
The results show that the performance of standard codec and mutual comparison is highly content dependent.For several content types, the multiview coding gain is significant, while for other contents the multiview coding only brings undesired overhead with no performance improvement.Overall, the codecs belonging to the same standard exhibit very similar performance with differences in PSNR in the order of 1 to 3 dB.Results obtained from objective metrics and subjective test are shown in Fig. 3.The MOS has been evaluated together with the 95% confidence interval.A detailed analysis of the results is published in [8].

Coding Efficiency Evaluation Based on the Subjective Test
Advanced results and analysis of subjective tests for all sequences and codecs are presented in Fig. 4. Legend in the figure is presented as follows: "First is the numbering and after that is the name of the video sequence, used encoder and target bitrate.The last columns presents the MOS".For example, in the first row, the second line is the sequence "Basketball" encoded by AVC with bitrate 1 Mbps.The central red mark in MOS is the median, the edges of the blue box are the 25th and 75th percentiles.The most extreme data points, without outliers, are the black whiskers.Outliers (Red Cross) are plotted individually.The following lines describe the subjective test results in Fig. 4: Basketball: The performance of AVC and MVC is very similar.In addition to the highest bit rate, there MVC is better.For HEVC and MV-HEVC, the quality is the same for all bitrates.There is no increase in quality between the bit rates 2 and 4 Mbps.The results of the subjective tests correspond approximately to the objective metrics.
PoznanHall: In the case of the HEVC codec, there is a gradual increase in quality with higher bit rates.On the other hand, with MV-HEVC, the quality was similar for all bit rates.
Train: The coding efficiency of HEVC and MV-HEVC is similar.Bit rate higher than 1 Mbps does not cause predicted improvement in the QoE.In the case where bit rate is higher than 2 Mbps, then the coding efficiency is similar for all codecs.There is no coding gain of the multiview codecs.
WishingWell: The performance of MVC is significantly better than AVC.It is a situation in which the codec is able to exploit multiview coding potential.Similar results were obtained for codecs HEVC and MV-HEVC.
The results of the Wilcoxon signed-rank test [28] are presented in Tab. 3.This test did not reject the hypothesis that AVC and MVC have similar coding efficiency.The same result is also obtained for HEVC and MV-HEVC.The H.265 standard was designed to produce a 50 % less bitrate compared to H.264 standard for the same image quality [6].The hypothesis that H.265 generation needs half the bitrate to compare to H.264, for the same quality also in stereoscopic video, has been proved (p equal to 0.31).If the value of p would be very small then the hypothesis would have not been proved (for example, number 0.02).The results show that the scattering of the test subject's evaluation in the subjective test is large [29], [30].For this reason, it was necessary to evaluate the participants who acted as outliers.We have used the whisker method for outlier values detection.Whisker (w) extends the interval of quartiles (Q25, Q75) by w on both sides.The whiskers are lines extending above and below each box (Q25-Q75).Whiskers are drawn from the ends of the interquartile ranges to the furthest observations within the whisker length.Observations beyond the whisker length are marked as outliers.In our case of normal distribution, w is equal to 1.5, which would correspond 99.3 percentiles coverage of values.An outlier is a value that is more than 1.5 times the interquartile range away from the top or bottom of the box [31].

Hypothesis about
There are two hypotheses about outlier rate.First, there is a difference in variance of QoE evaluation among the sequences.Second, the variance is larger at the beginning and at the end of the testing session, due to disorientation and fatigue.A hypothesis was tested concerning uniform distribution of outliers through sequences and time.
The Hi-square goodness-of-fit tests against discrete uniform distribution, in the case of sequences and time, have rejected this proposition at a significance level of 0.05 [32].We can prove that in our subjective test, after significantly lower outlier parts (first 8 video sequences) the rest of the evaluation time has uniform outlier rate.This hypothesis can be seen in Figs. 5 and 6.In these figures, the blue color indicates the results which were below the permissible deviation.The results marked in yellow color are those that were above the error of the mean.The data from participants, which has more than 10 % of outlier evaluations, was excluded.The number of participants not included in the final evaluation is four, which amounts to 11.8% of the test base.

Correlation of Subjective and Objective Metrics for Stereoscopic 3D Video
After evaluating the coding efficiency, it is necessary to determine which 2D metric has the greatest correlation with the subjective 3D test.For these purposes, Spearman's Rank Order Correlation Coefficient (SROCC) and PCC were applied to the results [33].Other evaluation methods of models performance, with respect to subjective tests, for objective quality assessment are described in [29].These analyses are used to determine the correlation between objective and subjective metrics.
The correlation scores are between +1 and −1, where −1 and +1 mean total positive and negative linear correlation respectively, and 0 denotes no linear correlation.Next possible method for evaluation of results is based on ROC curves [34].The VQM objective metrics has negative values in correlation, because their lower score indicates higher video quality.The correlation analysis was firstly applied to each sequence separately (see Tab. 4).Due to the fact that we need a universal metric, the correlation value was then calculated across all sequences.The results show that the correlation depends on the video content.The video "PoznanHall", which is from another video database, has a different correlation than other videos.It may also be due to the fact that the video has a large stereoscopic parallax (see Tab. 1).For some viewers, it could be distracting and therefore the video has a non-standard rating.For this reason, in the last row of the Tab. 4, the "PoznanHall" sequence is omitted and the resulting score, just in this row, is calculated without it.
The correlation between objective and subjective methods is plotted in Fig. 7.The black markers represent the video "Basketball", whereas red, green and blue colors indicate videos "PoznanHall", "Train" and "WishingWell", respectively.After a thorough comparison of all objective and subjective scores, it can be concluded that in our case the VQM objective metric best reflects the user's QoE for compressed stereoscopic 3D videos.
Correlation between the SSIM metric and the subjective test.

Innovative Models for Stereoscopic 3D Video Content
Although the VQM metric has the highest correlation, it is still not ideal for evaluating stereoscopic 3D videos.We thus propose our own model, which better models our subjective test results.Such a model should provide sufficient general predictions at least for content with similar parameters as the used video sequences.This section describes the model proposal, validation and verification of the models, and a description of each proposed model is provided at the end.
We have several objective parameters specifying Source Referent Contents (SRCs) as SI, TI and disparity.Other parameters describe our interventions -Hypothetical Reference Conditions (HRCs) as PSNR, SSIM or VQM coefficients.All the available sequence parameters (potential regressors) are summed up in Tab. 5.The column titled "Depth description" contains four parameters related to content depth.The first three are the quantiles (d05, d50, d95) of disparity distribution.The fourth parameter is disparity dynamic range, defined as d95 − d05.The disparity is calculated for a sufficient amount of significant corresponding pixels by the Speeded-Up Robust Features (SURF) algorithm [35].The last column contains seven coefficients whose linear combination forms the VQM value.We have only 64 samples of MOS, which is the response variable.To avoid over-parametrization, it is necessary to reduce the number of regressors.A good model needs about a hundred observations to one regressor.According to [36], to detect reasonable size effects with reasonable power, 10-20 samples per parameter are needed.The disproportion between the number of potential model parameters and "training" data is also the main reason of that why we focused on linear modeling.

Model Estimation Methods
The simplest and very common model estimation method for the General Linear Model (GLM) is Ordinary Least Squares (OLS).The OLS method minimizes the sum of squared residuals, which are the differences between the observed values and the estimated values of the quantity of interest.In our case, these values are the median of subjectively estimated quality (MOS) and the modeled MOS value.As we cannot exclude the correlation of regressors, Gener-alized Least Squares (GLS) has been utilized as the model estimation method [36], [37].There are two criteria on which regressors have been selected into our models: Akaike information criterion and coefficient of determination [38].
The Akaike Information Criterion (AIC) is a measure of the relative quality of statistical models for a given set of data.AIC is based on minimizing the relative information lost when a given model is used to represent the process that generated the data.It sets the proportion between the goodness of fit of the model and the complexity of the model.This level of parsimony is a function of input data sample relevance in a population.The AIC coefficient does not keep any absolute information about model quality, but the model with the lowest AIC is relatively best from the tested set.The coefficient of determination R 2 is the proportion of variance explained by the model to the variance of explained (modeled) variable.In the case of linear regression with statistically independent regressors, R 2 is the square of the coefficient of multiple correlations between model output and independent (explanatory) variables.The coefficient of determination is increasing with the number of regressors, even if they do not bring other new information.To choose a model with the optimal number of parameters, the adjusted R 2 is used.The adjusted R 2 (≈ R 2 ) is the best estimate of the degree of relationship in the basic population.The coefficient R 2 determines how our linear model would describe the population if we had ideal data samples.
The flowchart in Fig. 8 shows the process of setting models (left column), their verification (middle one) and validation (right column).First, the regressors are chosen from Tab. 5 at the base of the criteria mentioned in previous paragraphs.Secondly, the model is set by the GLS model estimation method.The standard deviation per sample (σ), sometimes in literature called as Root Mean Square Error (RMSE), is calculated.Here, RMSE is the ideal point estimation of σ.In the case of model verification, the dataset is randomly divided into 8 parts (literature recommends 5-10, a divisor of 64 was chosen).The model is set to training data (7/8 subpart of original data) and σ 1 is calculated from verification data (1/8 subpart).After 8 repetitions, the arithmetic mean value of σ 1−8 is calculated, called true error estimation (E). Figure 9 shows a residual plot, the scatter plot of verification samples deviations.It demonstrates how the observed values differ from the point of best fit.We can obtain a good overview about model bias and homoscedasticity.

Validation
The right column of the flowchart in Fig. 8 describes the process of model validation.For this purpose, a validation dataset has been added -other video sequences than those used in the subjective test.The validation videos (SRC 1-8) come from RMIT3DV -an uncompressed stereoscopic 3D HD video library.This database has been provided by RMIT University in Melbourne [21].From the sequences, those whose (potential regressors) parameter values are within the range of the original data values have been chosen.As validation data (SRC 1-8), the sequences 3D_01, 3D_03, 3D_05, 3D_16, 3D_17, 3D_29, 3D_42, 3D_48 were used.The HRC applied on selected sequences was HEVC with four levels of compression ratio (2x [250, 500, 750, 1000] kbps).The validation data is fully independent.The subjective tests have been done with other respondents.Once again, the ACR subjective method was used.Furthermore, the same display technology and test environment have been used.
The dynamic range adjustment is the second step done with the set of model.The full-reference objective video quality metrics as SSIM, VQM, Moving Pictures Quality Metric (MPQM) [39], Noise Quality Measure (NQM) [39] tend to be global QoE models.The generality of the metrics goes against accuracy, even in very complex models.Our goal was to make the most accurate model with limiting data amounts.Although the respondents are trained to set their quality dynamic range, they tend to utilize the full range of the QoE scale.This is the reason, why we decided to adjust the dynamic range of our model to validate the data optimally [24,40,41].
Validation is done by calculating the standard deviation of the model results and MOS values.The bar graph in Fig. 10 shows MOS values and a gray box containing 50 % of voted quality values.Three various point estimations of MOS, as the three corresponding linear models results, are plotted as color cross marks.The colored background refers to the validation content SRCs 1-8.Each colored surface contains the MOS values of four HRCs applied on one sequence.

Created Models
The details of the models are described in this subsection.Each block of text describes an individual model including its properties and differences from others.Table 6 sums up the model's accuracy and verifications.Correlation coefficient was calculated from original and validation data, therefore, they did not correspond to results from Tab. 4. The σ denotes the standard deviation per sample.It is calculated for the original (training) data, verification data and through both datasets (designated ).The standard deviation has not been calculated for the PSNR metric.This metric does not have defined range.Unlike, for example SSIM metric, where the value ranges from 0 to 1.The standard deviation for model VI.cannot be calculated separately for original and validation data because both datasets are used as the input data of the model.

Conclusion
Regarding the above-mentioned facts and obtained objective and subjective scores, answers to the questions from the Sec. 2 are as follows: 1) After a thorough comparison of all objective and subjective scores, it can be concluded that the VQM objective metric best reflects the user's QoE for stereoscopic 3D video content.However, even this metric does not reach a very high correlation with our subjective test.For more details see the results in Tab. 4 which shows a statistical comparison of objective metrics and subjective tests.From the point of view of outlier rate, it can be concluded that our assumption, that the variance of results is larger at the beginning and at the end of the testing session, due to disorientation and fatigue, was wrong.It was proved that in the subjective test, after a significantly better beginning part (first 8 video sequences), the rest of the evaluation has uniform outlier rate.Dependence of the number of outliers on the sequence was not significant.Results of objective metrics and subjective tests are available at https://www.vutbr.cz/www_base/vutdisk.php?i=145778aa4d.
2) For better modeling of our results, seven new models of objective metrics were created.These models have been validated and verified on other stereoscopic 3D video sequences (see Sec 5.2) and compared to the general VQM model.In general, we can state that the models that had the smallest error for our sequences were less accurate on other databases.On the other hand, models that were less accurate had a wider usable scope on other databases.Table 6 lists the most important data of our models, such as model descriptions, regressors, and their standard deviations."Model III" has the highest correlation with MOS for our dataset.This model, compared to a general VQM metric that had 32% deviation, had only 8% standard deviation.When we consider our source data and validation data, then "Model VI" had the smallest standard deviation.On the other hand, the most regressors are included in this model and the model is trained on the original input video sequences as well as on validation sequences.The model, which is the most balanced in all areas, is "Model V.The standard deviation for the original sequence is one third lower than for the general VQM.The deviation for the validation data is also slightly lower than for the classic VQM.Another benefit is that only three regressors enter the model calculation."Model V", for these reasons, can be determined as the most appropriate model due to its great versatility and sufficient accuracy.

Tab. 1 .
Average depth of video sequences.
Results of subjective test with 95 % confidence interval.

Fig. 3 .
Fig. 3. Video quality measured by objective metrics and subjective test.

Fig. 5 .
Fig. 5. Dependence of the number of outliers on the sequence.
Correlation between the PSNR metric and the subjective test.

•
AVC MVC HEVC MV−HEVC(c) Correlation between the VQM metric and the subjective test.

Fig. 7 .
Fig. 7. Dependence of objective metrics on the subjective test.

Tab. 5 .
Available objective parameters of the sequences.
About the Authors . . .JanKUFA (1990)  received his M.Sc.and Ph.D. degree in Electrical Engineering from Brno University of Technology (BUT) in 2014 and 2018 respectively.His research interests include digital television systems, video image quality, satellite television.Currently he is with at the Department of Radio Electronic, BUT.Ondrej KALLER (1986) received his M.Sc.and Ph.D. degree in Electrical Engineering from Brno University of Technology in 2010 and 2018 respectively.His field of interest includes digital television broadcasting systems.He is focused on 3D video capturing, transmission and interpretation.Ondrej ZACH (1988) received his M.Sc.degree in Electrical Engineering from Brno University of Technology in 2013.At present he is a Ph.D. student at the Department of Radio Electronics, Brno University of Technology.His field of interest is video technology and video coding.Ladislav POLAK (1984) received his M.Sc.degree in 2009, Ph.D. degree in 2013 and Assoc.Prof. in 2018.All in Electronics and Communication from Brno University of Technology, Czech Republic.Currently he is with at the Department of Radio Electronic, BUT.His research interests are Digital Video Broadcasting standards, wireless communication systems, signal processing, video image quality evaluation and design of subjective video quality methodologies.He has been an IEEE member since 2010.Tomas KRATOCHVIL (1976) received his M.Sc.degree in 1999, Ph.D. degree in 2006, Assoc.Prof. in 2009 and Full.Prof. in 2016, all in the Electronics and Communications program from Brno University of Technology.He is currently the Head of the Department of Radio Electronics, Brno University of Technology.His research interests include digital television and audio broadcasting, its standardization and video and multimedia transmission including video image quality evaluation.He has been an IEEE member since 2001 and IEEE senior member since 2016.