Video Quality Assessment

One of the main aspects which affects the video compression and needs to be deeply analyzed is the quality assessment. The chain of transmission of video over a determined channel of distribution, such as broadcast or a digital way of storage, is limited, and requires a process of compression, with a consequent degradation and the apparition of artifacts which are necessary to evaluate, in order to offer a suitable and appropriate quality to the final user.


Introduction
One of the main aspects which affects the video compression and needs to be deeply analyzed is the quality assessment.The chain of transmission of video over a determined channel of distribution, such as broadcast or a digital way of storage, is limited, and requires a process of compression, with a consequent degradation and the apparition of artifacts which are necessary to evaluate, in order to offer a suitable and appropriate quality to the final user.
The quick evolution of technology, especially referred to television and multimedia services, which has evolved from analog to digital.The constant increasement of resolution from standard television to high definition and ultrahigh-definition or the creation of advanced production of contents systems such as 3-dimensional video, make necessary new quality studies to evaluate the video characteristics to provide the observer the best viewing that could expect.
Once the change from analog to digital television has been completely developed, the next step was encoding the video in order to obtain high compression without damaging the quality contemplated by the observer.In analog television the quality systems were wellestablished and controlled, but in digital television it is required new metrics and procedures of measurement of the quality of video.
The quality assessment must be adapted to the human visual system, which is why researchers have performed subjective viewing experiments in order to obtain the conditions of encoding of video systems to provide the best quality to the user.
There has been a process of standardization in video encoding, the group of experts of MPEG developed techniques that assure the quality which would be improved with the evolution of the standards.MPEG-2 offered a reasonably good quality, but the evolution of the standards developed another one which was twice efficient as MPEG-2, which is called AVC/H.264, i.e. to obtain a similar quality than the first standard it was only necessary half the bitrate used in the new standard.
The quality assessment has also been force to evolve Parallel to technologies.The concept is not any more limited to the perceived quality of the video, but now there are other additives carried to this concept, making appear a new term called Quality of Experience (QoE) which is becoming more popular because it is a more complete definition, just because the user is not only observing the video, is living a real experience which depends on the content and expectatives placed on it.
In this chapter, a review of the systems that are particularly used for this important purpose will be analyzed, and spreading to the experience lived by the observer to talk in terms of QoE.
The purpose of this chapter is to provide a state of the art of vision quality assessment, analyzing models and metrics used in a variety of applications.

Human visual system
Visual perception is very important to human.We are constantly receiving information and processing it, in order to interact with the environment that surrounds us.That justifies the big interest existed in video and measurement of its quality, because there is a big necessity of receiving that information in our visual system as faithful as it appears in nature.
The evolution of technologies in codification has made video compression more efficient with the reduction of introduced artifacts.But the accuracy in developed vision models and quality metrics has increased when in consequence of the video content transfer from analogue to digital.The vision models are based on human perception, moving closer to the final consumer.The human visual system is extremely complex, but analyzing its behavior, characterizing the operation of the eye.The eye is a human body organ which is sensitive to a wide range of wavelengths from the radio-electric spectrum, from 400 to 780nm approximately.A large part of our neurological resources are used in visual perception.
For all these reasons, optimizing the performance of digital imaging systems with respect to the capture, display, storage and transmission of visual information is one of the most important challenges in this domain.
Video compression schemes should reduce the visibility of the introduced artifacts, creating more effective systems to reproduce video an image: Additionally printers should use the best half-toning patterns, and so on.In all these applications, the limitations of the human visual system (HVS) can be exploited to maximize the visual quality of the output.To do this, it is necessary to build computational models of the HVS and integrate them in tools for perceptual quality assessment.

Quality assessment 3.1 Types
There are different kinds of developing quality assessment.The main objective of quality assessment consists in the evaluation of any equipment of codification, transmission of its conditions of work, to assure the expectations of the users that are viewing the contents.
The chain of distribution joins different processes that degrade the image on each phase.In broadcasting the phases that affects more intensively to the quality of video are commonly the contribution and the distribution in which the video suffers degradations because of the process of encoding of the signal.This degradation will be analyzed.

Fig. 1. Phases of video broadcast
The easiest way for this purpose consists in selecting a high number of observers, with a great variety of sex, age and condition, and asking them to watch a series of contents, previously and well selected to cover the range of contents which could appear on a normal TV channel.The observers will be given a questionnaire to fill with their opinions about the quality observed.Once the questionnaires are complete, the statistics will reveal a collection of consequences.
The studies must follow a protocol which is basically described in ITU-R Recommendation BT-500 about subjective assessment with variations to adapt the study to the real situation but not far from the standards.The selection of video contents and the duration of sequences are important decisions to do a proper job and to be able to compare with similar studies.
Although the subjective studies offer real results as the response of the observers is collected, that kind of studies are expensive in time and money, and they are not always so efficient, because sometimes it depends on the place to elaborate the study and its conditions of lighting or comfort of the user, being able to change a valid result because of an external condition.That is the reason for the proliferation of objective measurements which are based on mathematical algorithm using the properties of the image, which keep a higher fidelity to the subjective obtained results.
One of the main points to describe in next sections is measurement of artifacts and impairments.
All process of video encoding generates degradation on the image, with a consequent apparition of concrete defects which are called artifacts, and will affect to the perceived quality of the observer.Researchers have made studies in order to evaluate this phenomenon, artifacts such as blockiness, blurring, ringing, color bleeding or motion compensation mismatches, have been widely analyzed and a collection of metrics with or without reference have been developed in this field, which use test signals and measurement procedures to determine the level of distortion.
Finally, the theme of evolution of technologies and its influence in quality assessment will be approached, by description of the state of the art of quality measurement in stereoscopic systems.The classic methods are utilized for this purpose, but the detection of new artifacts and types of impairments, makes the necessity of developing new metrics.Additionally, the concept is not only physical, and the evolution has lead to the term Quality of Experience (QoE).

Artifacts
Artifacts are defined as visible differences due to some technical limitation that presents an image.These effects occur in the process of production of video signal, in the phases of capture, compression, transmission, reception and delivery to the final recipient, the displayed picture may differ from the original.
They appear both in analogue and digital systems, but we will focus on the artifacts derived from the digital ones, especially because of video compression in the process of encoding and decoding.
Most common artifacts are caused by three reasons:


Artifacts due to analog and digital formats, its relationship and conversions between them (noise and blurring). Artifacts due to coding and compression (block distortion, blurring and ringing).


Artifacts due to transmission channel errors (errored blocks by lost packets).


Blocking effect The blocking effect, also known as tiling or blockiness, refers to a block pattern in the compressed sequence.It is due to the independent quantization of individual blocks.It appears especially on compression standards that utilises macroblocks of a certain size when performing a transform.A high impaired image presents artificial horizontal and vertical edges, clearly visible, parallel to the picture frame.
High compression, which reduces bitrate, in DCT-based encoding such as MPEG-2 or H.263 or similar, macroblocks with less information create homogeneous values in block pixels with boundaries in their edges, as a result of truncation of coefficients.Advanced codecs such as H.264 utilizes deblocking filters to reduce the visibility of this artifact.This effect is easy distinguished in next example of "Nasa" sequence and its tiling diagram.
For a given quantization level, block distortion is usually more visible in smoother areas of the picture Blur artifact is defined as a loss of energy and spatial detail as a reduction of edge sharpness, because of suppression of the high-frequency coefficients by coarse quantization.Blurring is also generated when reducing the bitrate of encoding.As this effect occurs in high frequencies, it is more visible in sequences with higher spatial complexity.
Next example of "Nasa" test sequence shows the consequences of reducing the bitrate, from the original one on the left to a high reduction seen on right image.Noise is defined as an uncontrolled or unpredicted pattern of intensity fluctuations that affects to the perceived quality of an image.
There are multiple types of noise impairments commonly produced by compression algorithms.Two common impairments are mosquito noise and quantization noise.


Mosquito noise.Mosquito noise is a temporal artifact seen mainly in smoothly textured regions as luminance/chrominance fluctuations around high-contrast edges or moving objects.It is a consequence of the coding differences for the same area of a scene in consecutive frames of a sequence.


Quantization noise.This is a type of noise produced after a severe truncation of the values of the coefficients while performing a transform such as DCT or Hadamard.
Other artifacts related are flickering, in images with high texture content, or aliasing, when the content of the scene is above the Nyquist rate, either spatially or temporally

Subjective quality assessment 4.1 Introduction
The user's opinion in video technologies is important.For that reason, the evaluation of a video compression codec must be developed with its help, in order to know the level of acceptance of it.
Several studies have been developed to measure the quality of new technology systems every period.In first instance, it was used to analyze the impact of digital television in the transition from analogue to digital.And in the transition from standard definition to highdefinition TV, including the studies of main encoders (MPEG-2, H.264/AVC) in the adaptation to different formats and storage, last time assessment was developed for blu-ray codecs.Also for the Definition of settings or quality parameters: bitrate, resolution (e.g. in HD: 720p or 1080i) or features for a TV channel, for example.
Next generation of encoding codecs HVC (High-efficiency Video Codec) which represents the evolution of H.264 is being analyzed by subjective assessment, to measure the impact on users and the necessity of its establishment.
So, it is clear that subjective quality evaluation is still important and it will always be together with the objective one.Methods and technologies change, but the purpose of video quality is still the same.
In this section, the most frequent techniques to develop a subjective quality study, in order to evaluate a determined system.There is a great variety of subjective testing methods, depending on the way of presenting the video sequences to the observers.
Then, there are single or double stimulus experiments, depending on if the original image is presented or only the degraded one.
The problem with this kind of studies is the requirement for a large number of observers viewing a limited amount of video contents, because if the time of contents viewed is extended for two long, the tiredness and fatigue of the observers will affect to the wished measurement.But it is still very used nowadays and will be necessary to introduce the objective assessment to explain in next section.

Description of requirements
Once the decision of developing subjective quality assessment made, requirements to complete a suitable subjective study must be consider.Before developing the study the conditions must be established to facilitate the repetition of it or the comparison with similar studies.Requirements will be listed in this section.


Viewing conditions.
There are different viewing conditions in case of referring to a laboratory environment than the ones used in home environment.Assuring the best conditions is a basic aim in order to allow a meaningful evaluation to be made derived from the assessment.The distance to the screen of the observers depends on the size of the screen.This is a very important aim to obtain suitable results.
The screen must be arranged in contrast and brightness and the angle of observation must not exceed 30º.


Conditioning of the room.
The room must be conditioned in lighting, with seat comfort and no reflections on screen.Because the viewers will spend a considerable time and this must not affect to their scores.

 Observers
The minimum number of participants in the study, for the sample to be enough is 15, but it is recommended to find as many observers as possible, 60 or more.The viewers are preferred to be non-professional, possessing no trained-eyes.Professional viewers tend to search for the impairment and their opinion is not always so impartial.
There must be a representative variety of age and sex in viewers.The observers must make a previous training to understand the objective of the test.

 Materials / Test sequences.
There must be a selection of sufficient test material, including different type of contents emitted by a conventional TV channel (sports, movies, news, documentary, etc.), and with different settings of spatial (more or less level of detail and high frequencies) and temporal (faster or slower contents) complexity.
Each sequence must have a duration of between 10 and 20 seconds due to human memory and perception, to assure a correct viewing, neither too short than the viewers do not have time to observe the image in detail, nor too long to avoid causing fatigue in the observer.
A number of organizations have developed test still pictures and sequences, whose use is recommended for the assessment.

 Presentation of results
The results must be presented in detail.All information is necessary to validate the study and verify its good performance.Data given must include: details of the test configuration and materials, type of source and displays, number of subjects or observers that participated, reference system used and its specific variations, scores and mean scores adjusted to 95% confidence interval.
Logistic curve-fitting and logarithmic axis will allow a straight line representation, which is the preferred form of presentation, as legible as possible.
Audio quality assessment is preferably developed independently of the video assessment.In fact, it is recommended not to use sound or audio in video studies, in order to avoid distractions in the observer, modifying their opinion, off target.
Furthermore, in case of using it, selection of the accompanying audio material should be considered at the same level of importance as the selection of video material.
Additionally, it is important that the sound be synchronized with the video.This is most noticeable for speech and lip synchronization, for which time lags of more than approximately 100 ms are considered very annoying.

Definition of settings of methods
A collection of settings can be varied, depending on the type of desirable results to obtain at the end of the quality study.In this section, some of the most important are described.

 Single or Double Stimulus methods
In double stimulus methods, viewers are shown each pair of video sequences, the reference and the impaired one.Whereas, in single stimulus methods, viewers are shown only the impaired sequence.
The number of stimulus defines the possibility of comparison to a reference, which allow the observer to detect the artifacts and impairments more easily on the image than without any original signal.
In real conditions the user does not have a reference to compare, so a single stimulus method is considered more realistic.But, a double stimulus avoids more efficiently the errors occurred by context effects.Context effects occur when subjective ratings are influenced by the severity and ordering of impairments within the test session.
In double stimulus there are two different kinds of presenting each pair of sequences, depending on the number of screens used on the study.With two screens every pair can be presented simultaneously, allowing the user to compare at the same time detecting the variation of quality.
The comparison scale is only available in double stimulus methods.

 With or without repetition methods
One of the main problems and context effect affecting to the results of subjective assessment is the fatigue of the observers.The observer has a limited time in which its scores are effective.
Long sessions produce high fatigue and exhaustion, which distort the results and invalidates the assessment.For that reason, the time of each session must be reduced to less than half an hour with extended breaks.
Depending on the accuracy of the study, each pair can be presented twice, with one or more repetitions.If the variety of parameters to measure in sequences is wide, it is possible to reduce the time for sessions, presenting each pair of sequences only once, i.e. without repetition.The objective is saving time, expanding the quality parameters (QP) under evaluation, avoiding the fatigue of observers.

 Absolute or Comparison methods
Depending on the objective of the study, it is possible to define the expected results obtained.Absolute results are related to single stimulus methods, whereas, comparison www.intechopen.commethods are more related to double stimulus methods, although it is possible to obtain absolute measurements with full reference.
First type of methods utilizes indistinctly the quality or the impairment scale, while the second type utilizes a scale called "comparison scale", which assigns the relation between the members of each pair of sequences.The numerical scale uses numbers to obtain the opinion of the observers.The scale depends on the number of grades on the scale The most frequent scale used in numerical terms is known as Mean Opinion Score (MOS), normalized as the five-grade scale in range from 1 to 5. Other scales are the 10-grade scale from 1 to 10, or 8-grade scale from 1 to 8, but sometimes it is difficult to find equivalent adjectives for each grade.Other different numbers scales are, for example, "compare scale", which utilizes 7 grades including zero to indicate no perceptible variation.
Zero is rarely used because of its negative connotations.

Most frequent methods
Combining the different settings to develop a subjective evaluation method, it is possible to define the most common methods.Even so, there are other combinations, also acceptable, that do not appear on the next list of the most representatives.
This is the method used by the European Broadcasting Union (EBU), in order to measure the robustness of systems (i.e.failure characteristics).
The reference and the test sequence are shown only once.Subjects rate the amount of impairment in the test sequence comparing one to the other.The purpose of this method is to quantify the quality of systems (when no reference is available).
The method of this type called Absolute Category Rating (ACR) utilizes a single stimulus.
Viewers only see the video under test, without the reference.They give one rating for its overall quality using a discrete five-level scale from 'bad' to 'excellent'.The fact that the reference is not shown with every test clip makes ACR a very efficient method compared to DSIS or DSCQS, which take almost 2 or 4 times as long, respectively.For this method, test clips from the same scene but different conditions (quality parameter under evaluation) are paired in all possible combinations, and viewers make a preference judgment for each pair.This allows very fine quality discrimination between clips.
This method uses a comparison scale.The purpose of this type of study is to assess not only the basic quality of the images but also the fidelity of the information transmitted.Two screens are necessary for this method of evaluation, which are parallel placed in front of the user.The left screen plays the reference sequence, while the right one plays the impaired sequence that viewers must score.
The main purpose of this method is to measure the fidelity between two video sequences.It is also used to compare different error resilience tools.
Each video pair is shown once or twice.The duration of the test session is shorter, and allows to evaluate a higher amount of quality parameters.

Objective quality assessment
In this section, the most important metrics are described to make to offer an overview of the techniques that develop this type of quality assessment.
There are three types of objective quality assessment depending on the presence and availability of a reference image or any of its features to develop the study: Full-Reference (FR), Reduced-Reference (RR) and No-Reference).
Old metrics designed for digital imaging systems, such as MSE (Mean Sqaured Error) and PSNR (Peak Signal-to-Noise Ratio), which are still very used to develop quality assessment, are defined in this section.They are still adequate for evaluating error measures The analysis of measurement of artifacts such as tiling or blurring, especially introduced by video compression algorithms, will be interesting to offer the reader a perspective of evolution of metrics.

Objective quality metrics
Depending on the presence of a video reference, three kinds of analysis are defined:  Full Reference (FR) metrics, when the original image is present and can be used to compare it to the degraded image in order to obtain the reduction of quality because of the process of encoding and decoding.No-Reference (NR) metrics.There is no original image or properties of it, in this case the degraded image and its affection to the human visual system is the only tool to conclude with the quality of an image.This kind of metrics are more complicated but are of vital importance in environments which are difficult to provide any reference, such as mobile or internet multimedia services.These metrics offer a good estimation of the global quality measured objectively, but are widely criticized for not correlating well with the perceived quality measurement obtained by subjective methods, that incorporates human vision models.
The most important are among others: MSE, SNR and PSNR.

MSE (Mean Square Error)
The mean squared error (MSE) is one the most popular difference metrics in image and video processing.The MSE is the mean of the squared differences between the luminance level values of pixels between two images (X and Y), normally the original frame used as a reference and an impaired image, obtained by processing of the first image.M and N are the horizontal and vertical dimensions of each frame of the sequence.

 
1, 1 2 SNR was widely substituted by its evolution PSNR, because it offers a higher efficiency than SNR, and a global extension more easy to compare in studies with different signals.

PSNR (Peak Single-To-Noise Ratio)
As SNR used the same signal to compare with, it is more difficult to export the conclusion from one study to another, that is why the original signal was changed by the peak value (255 in a RGB channel, or 240 in luminance, for example) to obtain more general results, with a more efficient method.

Blockiness or tiling
The metric defined by MSU Graphics & Media Lab measures subjective blocking effect in video sequence, based on energy calculation and gradients.In contrast areas of the frame blocking is not appreciable, but in smooth areas these edges are conspicuous.
Other metrics are based on the structure of the pixelized image.The model included in Lee et al. research, first extracts edge pixels and computes horizontal ( H(t,i, j) ) and vertical (V (t,i, j) ) gradient component of the edge pixels.The gradient is calculated using the Sobel operators.From the horizontal and vertical gradient images, the magnitude (R) and angle () are extracted: 22 (,, ) (,, ) (,, ) Rti j Hti j Vti j  1 (,, ) (,, ) t a n (,, )

Vti j ti j
Hti j Analyzing the angles of the gradient, the pixels with gradient parallel to the picture frame are considered as belonging to blocking region, if they have a determined magnitude.
Comparing to the original image, errors are avoided due to real edge pixels.
Other interesting metric on this field are the ones by Winkler et al., 2001 andWang et al., 2002.

Blurring
Blurring metrics are based on the analysis of energy in high frequencies and analysis of edges and their spread.The methods traditionally for quality assessment attempted to quantify the visibility of differences between pairs of images, a distorted image and ist corresponding reference image using a variety of known properties of the human visual system.Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, some researchers introduce alternative studies for quality assessment based on the degradation of structural information.

 Based on vision models
There are a wide range of systems which utilizes models that attempt to reproduce similarities with the Human Visual System.In this section, four of them are introduced.


Just Noticeable Differences (JND).The Sarnoff Model also known as Visual Discrimination Model (VDM) was copyrighted by Tektronix Company and commercialized in their PQA600 Picture Analyzer.The model works in spatial domain.Its acceptable conclusions are obtained as a result of its high fidelity in comparison with HVS, due to its complexity.


Visual Differences Predictor (VDP).Unlike JND, VDP works in frequency domain, and it is very popular in prediction of encoding errors thanks to the labour of S.
Daly.The model is based on the comparison of two images after creating a diagram of disparity, to detect the image variation. Moving Picture Quality Metric (MPQM).As PSNR does not take the visual masking phenomenon into consideration, every single pixel error contributes to the decrease of the PSNR, even if this error is not perceived.This method includes characteristics of Human Visual System intensively studied: contrast sensitivity and masking. Perceptual Distortion Metric (PDM).This model of vision was developed by Winkler, S. Based on the HVS, allowing the system to find similarities with the human eye.The structure of the model is based on the fact of finding the optimus components of the model, modifying both the reference and impaired image.

No-reference metrics
When the reference is not available to design the objective quality method, for example in environments such as internet or video mobile, then it is necessary to utilize no-reference metrics to evaluate the degradation.
Most of the times, these kind of studies are focus on analyzing impairments due to artifacts, that degrade the perception of the user.So, the no-reference metrics are distributed in groups, depending on the artifact characterized.

Blocking Effect or Blockiness
Most existing no-reference metrics focus on estimating blockiness, which is still relatively easy to detect due to its regular structure, although in practice that is not so easy due to the use of deblocking filters in H.246 and other encoders.
Different techniques are used, such as Wu and Yuen whose research in developing a NR metric based on measuring the horizontal and vertical differences between rows and columns at block boundaries, offers interesting results.Means and standard deviations of the blocks adjacent to each boundary determined masking effect pondering.
On the other hand, Wang et al. model the blocky image as a non-blocky image an then appear the interference with a pure blocky signal.The level of blockiness artifact is detected by evaluating the blocky signal.
Other alternatives are the approach proposed by Baroncini and Pierotti, with the use of multiple filters which extract significant vertical and horizontal edge segments due to blockiness, and also Vlachos proposed an algorithm based on the cross-correlation of subsampled images, and Tan and Ghanbari a metric for blocking detection based in videos compressed in MPEG-2.

Blur
Another typical artifact defined for no-reference metrics is blurring.It appears in almost all processing phases in communications chain of production.Blurring manifests as a loss of spatial detail in moderate to high spatial activity regions of images.Blurring is directly related to the suppression of the higher order AC DCT coefficients through coarse quantization.
Marziliano et al. worked on blurriness metric.Other metrics have been developed to achieve results while working with other kind of artifacts.Marziliano et al. worked on blurriness metric.As object boundaries are represented by sharp edges, the spreading of significant edges in the image gives a nice approach of blurring.Blurriness and ringing metrics have been developed to evaluate JPEG2000 coding as well.
Other metrics carry out the measure by working with DCT coefficients directly.Coudoux et al. detected the vertical block edges and combined them with several masking models applied in the DCT domain.

Ringing
Ringing is a shimmering effect around high contrast edges.Ringing is not necessarily correlated with blocking as the amount of ringing depends in the amount and strength of edges in the image.A visible ringing measure (VRM) based on the average local variance has been developed in [5].It is relation to the Gibbs effect.
Marziliano et al. present a ringing metric based on their blur metric describe on previous section, which utilizes the carachteristics from JPEG2000 video encoding to obtain suitable results, but the metric does not extend to other compression standard, but it means a good approximation.

Other metrics
There are other studies in metrics based on noise, or other artifacts related to determined video compression standards such as MPEG-2 or JPEG-2000.
Finally, it is important to mention the systems based on the combination of various individual metrics, weighted to properties of the image and their spatial and temporal complexity.

Quality in emerging technologies: 3D
One of the most important achievements related to digital video developed in last years has been the next generation 3D stereoscopic contents.Their development is based on the search for illusion of depth perception.After some vain attempts due to a first of generation of 3D developed by anaglyph movies, finally acceptable results have been achieved as seen in the success among users.
3D video offers a new experience to the user, but the acceptance of this experience must be evaluated, in order to draw conclusions about the generation of video contents.That is the reason why quality assessment in 3D systems is more related to the concept of Quality of Experience (QoE) of the user, because it is not just an enhancement of the quality, but a fundamental change in the character of the image.

Binocular disparity
The concept of binocular disparity is defined as the fact that the brain extracts depth information from the left and the right eye views, receiving a slightly horizontally shifted perspective of the same scene.As a result, the observer perceives the objects in image positioned in three-dimensional space, creating an illusion of depth perception, positioning them in front of or behind the viewing screen.Binocular disparity refers to the difference captured by two cameras in computer stereovision.The disparity of an object in a scene depends both on camera baseline and on the distance between the object and the camera.
There are different techniques to realize this, such as color or polarization filters, whose intention is to separate the left and right eye views, and orient them to each eye, to produce that illusion.

Fig. 14. Description of binocular disparity
The complexity of developing perfect binocular disparity is the cause of introducing impairments and defects on image.We pay our attention to three main factors, namely: scene content, camera baseline and screen size

3D quality systems
3D stereoscopic dimension carries all the 2-D quality assessment, but in addition some other specific factors must be analyzed to assure the QoE of the final consumer.In this chapter some general aspects to understand 3D video are described in order to justify the alternatives used in quality assessment and their differences with 2-D video.
The term 3D denotes stereoscopy, i.e. two-view system used for visualization.Due to the difficulty of creating this type of contents by the use of dual cameras, still a high percentage of stereoscopic video is obtained by the conversion of video from 2D to 3D based on the extraction of depth information from monoscopic images.
The quality is improving with this method, but it is necessary to evaluate the results in depth calculation and the experience of the final user.This aim is of vital importance in 3D quality assessment, at least until the production of all the contests are in real 3D.

Formats and Encoding 3D
Another feature to consider is the different ways of encoding the views, left and right.As both views are correlated, and present similar content with small differences between them, the techniques for compression take advantage of this feature to reduce the amount of data generated, because it is much higher the amount of data required for broadcasting, and new systems of encoding are necessary to develop its features.It is also important the synchronism, i. e. the methods used in compression are recommendable to assure that both views are seen at the same time with each eye.For this purpose, a series of systems appeared in two groups:  Frame Compatible Service (FSC).In FSC, both views are content in the same frame.The disadvantage of this system is the reduction of resolution of the complete image (this fact affects to the global quality) but the synchronism is assured.
There are different versions of these services: side-by-side, up-down, line-by-line and checkboard, depending on the way of distributing both views in the frame.


Service compatible (SC).MVC (Multiview Video Coding) is the main standard for this type of services.This standard is an amendment to H.264/MPEG-4 AVC video compression standard that enables efficient encoding of sequences captured simultaneously from multiple cameras using a single video stream.MVC is the standard used in Blu-ray 3D releases, because allows older devices and software to decode stereoscopic video streams, ignoring additional information for the second view.
Also, depending on the type of 3D display, the method of assessment is different, according to the settings of each technique used.The groups of technologies according to its displaying technique are:  With glasses:  Passive glasses (normally with circular or linear polarization).In FSC, both views are content in every frame.The glasses help the eyes to separate the views and redirect them to the corresponding eye.


Active glasses.The image is presented to each eye alternating from one view to other every frame.The system must assure the synchronism between display and glasses.


Without glasses: Autostereoscopic.Different techniques, such as parallax barriers or lenticular displays, are used for this purpose, but there is still a lot of work in this field.Serious quality studies are developed.

Classic video methods of quality assessment
Further analysis suggested that visible 2D artifacts detract from stereo quality even more than they do in the conventional 2D (non-stereo) image.
As any type of video, sterescopic video is acceptable to be evaluated by classic methods of quality assessment, including subjective and objective (FR or NR).
These video sequences admit metrics such as PSNR or blocking and blurring measurement, because they have also be processed using same techniques as 2-D video (i.e.H.264, MPEG-2; VC-1).
But stereoscopic video has the advantage of disposing two views (left and right) wellcorrelated.This fact could benefit the assessment, since for normal (without excesive parallax) 3-D captures, one view could be used to predict the second one.

Visual comfort and fatigue
The visual comfort of stereoscopic images is certainly one of the most critical problems in stereoscopic research.The term visual discomfort is generally used to refer to the subjective sensation of discomfort often associated with the viewing of stereoscopic images.Sources of visual discomfort may include excessive binocular disparities, conflicts between accommodation and vergence (the simultaneous movement of both eyes in opposite directions), appearance of specific artifacts such as crosstalk or boundaries and imperfections in 3D rendering.
A large number of studies have been developed in order to predict the response of the user to this parameter, analyzing the maximum exposure time to avoid the fatigue and the main reasons of visual discomfort, in order to avoid it in the future and obtain better quality of experience.M. Lambooij reviews the principle conclusions in this field.

Parallax and depth adjustment
The parallax is the corresponding distance on the plane of the image to the inter-ocular distance when visualizing a determined object.The effect over the objects differs in the capture of the images modifying the distance between the pair of cameras used to take the stereoscopic images.

Fig. 15. Scheme of parallax distance
Hyper-stereoscopy is a very characteristic effect in 3D images in which the observer appreciates the volume of the objects much closer to them.This effect is a consequence of modifying the parallax on the image.When separating the pair of cameras a distance higher than the average of the distances between human eyes.So, it could be a reality that the hyper-stereoscopy increases the quality of experience in users who are viewing this kind of images, but it is necessary to define a limit in which this effect stops being satisfying to the users.
The quality of experience increases but it is necessary to develop new studies to determine the limits of parallax in order to develop recommendations for contents creators and broadcasters.

Crosstalk (Ghosting)
Crosstalk, also known as ghosting, is the artifact defined as the leakage of one eye's image into the image of the other eye, i.e. imperfect image separation.
Recent experiments by Tsirlin et al. describe the effect of crosstalk on the perceived magnitude of depth from disparity and from monocular occlusions.

Boundaries
Due to imperfect depth maps, a degradation artifact is particularly notably around object boundaries.This is derived from a texture-depth misalignment whose apparition is consequence of techniques of capture and processing of 3D-images and must be supressed as much as possible.
Y. Zhao proposes a novel solution of suppression of misalignment and alignment enforcement between texture and depth to reduce background noises and foreground erosion, respectively, among different types of boundary artifacts.

Puppet-theatre and cardboard effects
The puppet-theatre effect is a size distortion characteristic of 3-D images, revealed as an annoying miniaturization that make people and other objects look like animated puppets Spatial distortion prediction system for stereoscopic images.
As a result appears the cardboard effect which is a distortion resulting in an unnatural depth percept, where objects appear distributed in the image as if they were in different discrete depth planes, showing little volume.
Both effects affect to the sensation of reality, spoiling it.The puppet-theatre effect is therefore not perceived as an artifact, physically measurable; rather, but should be subjectively evaluated, as it decreases the quality of the 3D experience.
Studies from the NHK (Japan Broadcasting Corporation) research these phenomena by variation of position of the objects, and shooting, display and viewing conditions to predict the level of distortion introduced by this effect.

Conclusions
The large amount of studies developed for this purpose related to quality assessment gives a general idea about the importance of this theme in video compression.The evolution of metrics and techniques is constant, finding the best ways of evaluating the quality of video sequences.
A complete state of the art in quality assessment has been developed, describing techniques of subjective and objective assessment, with the most common artifacts and impairments derived from compression and transmission.In objective research,

Acknowledgment
This research has been developed by a member of the research group G@TV (Group of Application of Visual Telecommunications) in Polytechnic University of Madrid (Universidad Politécnica de Madrid).
The work presented on this chapter is a result of multiple studies developed through years in the field of video quality assessment, including a software implementation based on fullreference and no-reference metrics.Additionally, multiple quality studies have been developed for a wide range of purposes and projects, such as ADI, Palco HD, ACTIVA, Furia and Buscamedia were financed by Spanish Ministry of Industry In the environment of project ADI (Interactive High Definition) subjective assessment that analyzes the impact on users of High Definition TV (HDTV) as an evolution of Standard Definition (SDTV).Besides, objective quality assessment was developed.Palco HD project led by satellite communications company Hispasat and in collaboration with TSA (Telefónica Audiovisual Services) and RTVE (Spanish Radio and Television).Activa project led by cable company ONO and Buscamedia led by communications company Indra.
Also privately funded projects in collaboration with companies such as Telefónica I+D for developing no-reference objective quality assessment in video mobile environment or with platform Impulsa TDT, developing subjective assessment in order to define video settings in High Definition TDT (Terrestrial Digital Television).
Finally, thanks to David Jiménez for introducing me in the field of video quality assessment.

Fig. 2 .
Fig. 2. Example of sequence high blocking effect  Blur

Fig. 3 .
Fig. 3. Evolution of blurriness when reducing bitrate  Ringing The ringing artifact occurs when the quantization of individual DCT coefficients due to irregularities in high frequencies in the reconstruction of macroblocks.The effect is related to Gibb's phenomenon.It is more visible in high contrast edges, in areas with smooth textures.It is easily visible on next example, around the edges of the objects, apparing like duplications of their contours.


Fig. 5. Example of data from a continuous assessment  Type of scale There are different types of scale, depending on desirable results that the researcher expects to obtain from the study.The four most representatives appear below.1. Quality Scale (QS) This scale is used in different methods to evaluate in absolute the perceived quality of a video sequence.There are variations with different number of grades.5 Excellent 4 Good 3 Fair 2 Poor 1 Bad

Fig. 6 .
Fig. 6.Scheme of a DSIS system  The double-stimulus continuous quality-scale (DSCQS) method The main purpose of DSCQS method is to measure the quality of systems relative to a reference.Viewers are shown pairs of video sequences (the reference sequence and the impaired sequence) in a randomized order.It is widely accepted as an accurate test method with little sensitivity to context effects, as viewers are shown the sequence twice.Viewers are asked to rate the quality of each sequence in the pair after the second showing.It is also used to measure the quality of stereoscopic image coding Since standard double stimulus methods like DSCQS provide only a single quality score for a given video sequence, where a typical video sequence might be 10 seconds long, questions have been raised as to the applicability of these testing methods for evaluating the performance of objective real-time video quality monitoring systems.

Fig. 9 .
Fig. 9. Scheme of a SC system  Single stimulus continuous quality evaluation (SSCQE)Instead of seeing separate short sequence pairs, viewers watch a program of typically 20-30 minutes' duration which has been processed by the system under test; the reference is not shown.Using a slider, the subjects continuously rate the instantaneously perceived quality on the DSCQS scale from 'bad' to 'excellent'.

Fig. 12 .
Fig. 12. Full Reference (FR) metric diagramReduced Reference (RR) metrics.The original image is not available for the study but there are some properties and characteristics of it which can be used to obtain quality results.No-Reference(NR) metrics.There is no original image or properties of it, in this case the degraded image and its affection to the human visual system is the only tool to conclude with the quality of an image.This kind of metrics are more complicated but

Fig. 13 .
Fig. 13.No Reference (NR) metric diagram Also based on pixel by pixel comparison, this metric measures the relation between the original image and the degraded image, in order to evaluate the degradation on image.
metrics attempt to assess the effects of artifacts described on section 0, instead of offering a global idea of quality, as with pixel-based metrics.Some examples of most representative metrics are introduced next.
As it proposes Marziliano et al. in 2004.The reduction of edge energy between the original and the impaired image shows the loss of quality due to blurring artifact.Other metrics, such as proposed in Lee et al.Research, utilizes the gradient calculated in every pixel of the image to detect the blur artifact by analyzing the diminution of this magnitude between the original and the impaired image.SI is the root mean square of the spatial gradient (SG), so blurring is computed as follows: fundamentally related to the Gibb's phenomenon, when quantization of individual coefficients results in high-frequency irregularities of the reconstructed block.Yu et al. offers a segmentation algorithm to identify regions with ringing artifacts, which is more evident along high contrasts in regions with smooth texture and complex texture.The original and the processed video sequences are input into the metric and decomposed respectively by the spatial-temporal filter banks  Based on structural similarity The most representative metric based on Structural Similarity is SSIM.The metric proposed byWang et al.  is based on the combination of three properties of the image luminance, contrast and structure, by comparison between the original and the impaired image, three conditions must be met: symmetry, boundedness and being unique maximum. .proposed a modified DCT-based video quality metric (VQM) based on Watson's proposal, which exploits the properties of visual perception, using the existing DCT coefficients, so it only incurs slightly more computation overhead.