Effective computation-aware algorithm by inter-layer motion analysis for scalable video coding☆
Introduction
In order to effectively transmit an enormous amount of data through a variety of transmission environments and support a diversity of hardware device, such as mobile phones, notebooks, television, or high quality HDTV, an extension of the H.264 video coding standard [1] called scalable video coding (SVC) [2] has been developed to achieve this goal. SVC supports spatial, temporal and quality scalabilities so that the SVC encoder can generate one or more subsets of bitstreams to meet the preferences of different user terminals or network restrictions. Fig. 1 shows the system block diagram of SVC with two spatial scalable layers. First, the input video with full resolution is downsampled to smaller resolution and inputted into the H.264/AVC compatible encoder to generate a scalable base layer (BL) bitstream. Afterwards, the full resolution image is encoded by using the H.264/AVC video coding encoder to generate a scalable enhancement layer (EL) bitstream that additionally considers the inter-layer information to further improve the coding performance. Moreover, quality scalability can be achieved by reusing the quantization module through the coefficients accumulation approach. In SVC, the inter-layer prediction includes inter-layer motion prediction, inter-layer residual prediction and inter-layer intra prediction, all of which utilize the base layer information for the prediction in the enhancement layer at the expense of increased computational complexity.
Since SVC is the extension of H.264/AVC, the computational complexity of H.264/AVC has also been inherited by SVC. To efficiently decrease the number of modes to be tested, [3] proposed a fast mode decision to accelerate the encoding process of SVC through coded block pattern (CBP) analysis. CBP contains the residual information of each macroblock and provides cues for mode prediction in the encoding process. This work reduces encoding time significantly in both the SVC’s quality and spatial scalabilities with only minor degradation in PSNR. To effectively reduce the computational complexity of SVC, many algorithms [4], [5], [6], [7], [8], [9], [10] had been proposed to reduce the computational complexity of the enhancement layer in SVC by utilizing the correlation between BL and EL. Although these studies can reduce the coding burden, they are lacking in the consideration of computational variation. In other words, computational variation could noticeably affect visual quality. [11] proposed a rate-distortion model for describing the motion prediction efficiency in inter-frame wavelet video coding to improve the mode decision. By adopting the proposed mode decision procedure, the work improved both the PSNR performance and the visual quality for scalability cases with less extraction-bitrate dependence. Research [12] analyzed the rate distortion costs between the base and enhancement layers and sorted the rate distortion costs of the sub-MB in the base layer to indicate the prediction order of the macroblock in the enhancement layers. Afterwards, based on the required computations of each prediction mode, a scalable approach was thus proposed to achieve computational scalability. In [13], the computation of each prediction mode was normalized first. Then, a linear distribution was proposed to allocate the computational complexity for each macroblock in the enhancement layer by considering the rate distortion costs of the sub-MBs in the base layer. In [14], Tai et al. introduced the computation-awareness (CA) scheme. First, the MBs were sorted with their mean square error (MSE) values since the authors assumed that the MBs with the larger MSE values would have a greater opportunity to reduce the MSE. Afterwards, the computation was allocated by considering the MSE information. Chen et al. [15] proposed an adaptive search strategy using the CA concept. This work adaptively changed the diamond search, three step search, and full search regarding motion estimation by considering motion vector prediction or a variation of neighboring motion vectors. The MBs were classified in [16] by analyzing the MB complexity using rate distortion costs and the MV threshold. Afterwards, the classified MBs were searched for by different methods. Our previous work [17] also proposed a linear model to distribute the computation complexity for each macroblock by considering the relationship of the motion vector differences between the base and enhancement layers. Although the approach could achieve an acceptable tradeoff between the computational complexity and the rate distortion performance, in order to fit the model there still remains room for us to find out more accurate parameters and further improve the coding performance.
In this paper, we propose a strategy to control the computational complexity in the enhancement layer for scalable video coding. The motion vector differences (MVDs) are first analyzed to observe their relationship between the base and enhancement layers. By using the analytical results, the computation-aware algorithm is proposed to allocate computations in the scalable spatial and quality enhancement layers by considering the predicted search points in the enhancement layer and the rate distortion costs of the base layer.
The rest of this paper is organized as follows. In Section 2, we analyze the motion vector differences and rate distortion costs of BL and EL, and then their correlations are investigated. In Section 3, our proposed computation-aware algorithm is described in detail. In Section 4, the experimental results are exhibited to show the efficiency and performance of our proposed scheme. Finally, the conclusion is provided in Section 5.
Section snippets
Motion vector difference analysis
Motion estimation takes the motion vector prediction as the center of the search window and then finds the best motion vector in the search window. MVD is the difference between the motion vector and motion vector prediction center encoded into video bitstream. Therefore, the MVD could provide some information about the motion behavior of the video content. In SVC, the enhancement layer obtains the information from the base layer to increase the coding efficiency. For spatial scalability, the
Proposed computation allocation algorithm
From the previous section, we can observe that the motion vector differences as well as the rate distortion costs between BL and EL have a high correlation. Therefore, we employ the motion vector difference and rate distortion cost to allocate the computation for macroblocks within a frame in EL. The flowchart is shown in Fig. 11. First, we use the motion vector difference calculated in (2) to estimate how many search points for a full search are required for the i-th MB in the enhancement
Simulation results
Table 8 tabulates the simulation settings for the experimental results in the section. In our simulation, two researches [12], [13] are adopted for comparison and the performance measurements of PSNR, Bit-rate, BDPSNR, BDBit-rate [19], [20] and SSIM (Structural Similarity Index Metric) [21], are evaluated.
Fig. 13 shows the computation allocation results for different test video sequences with 10% computation availability. From these figures, we can observe that the computations can be
Conclusion
In this paper, the motion vector difference and rate distortion cost are effectively observed, analyzed, and used for deriving our computation allocation algorithm for SVC. Through the help of our proposed computation allocation algorithm, the computations can be efficiently allocated to MBs belonging to the regions which need more computations to find out the best prediction results. Our simulation results show that our proposed algorithm outperforms the other computation-aware literature in
Acknowledgment
This work was supported by Ministry of Science and Technology, Taiwan, R.O.C. under Grant NSC 99-2221-E-259-019-MY3 and Grant NSC 102-2221-E-259-022-MY3.
References (21)
- et al.
Mode decision acceleration for scalable video coding through coded block pattern
J. Vis. Commun. Image Represent.
(2012) - et al.
A rate-distortion analysis on motion prediction efficiency and mode decision for scalable wavelet video coding
J. Vis. Commun. Image Represent.
(2010) - et al.
Overview of the H.264/AVC video coding standard
IEEE Trans. Circuits Syst. Video Technol.
(2003) - et al.
Overview of the scalable video coding extension of the H.264/AVC standard
IEEE Trans. Circuits Syst. Video Technol.
(2007) - et al.
Fast mode decision for combined scalable video coding based on the block complexity function
IEEE Trans. Consum. Electron.
(2011) - X. Sun, S. Xiao, J. Du, M. Hu, Fast mode decision algorithm for H.264/SVC based on motion vector relation analysis, in:...
- et al.
Selective inter-layer residual prediction for SVC-based video streaming
IEEE Trans. Consum. Electron.
(2009) - et al.
Generic techniques to reduce SVC enhancement layer encoding complexity
IEEE Trans. Consum. Electron.
(2011) - et al.
A low complexity mode decision method for spatial scalability coding
IEEE Trans. Circuits Syst. Video Technol.
(2011) - et al.
Fast mode decision algorithm for scalable video coding using bayesian theorem detection and markov process
IEEE Trans. Circuits Syst. Video Technol.
(2010)
Cited by (1)
Low bit-rate SNR scalable video coding based on overcomplete dictionary learning and sparse representation
2020, Multidimensional Systems and Signal Processing
- ☆
This paper has been recommended for acceptance by M.T. Sun.