Tunable early CU size decision for depth map intra coding in 3D-HEVC using unsupervised learning

https://doi.org/10.1016/j.dsp.2022.103448Get rights and content

Abstract

To further improve coding gain for depth map in 3D extension of High Efficiency Video Coding (3D-HEVC), many new coding techniques are introduced, which drastically increase the coding complexity. In this work, an early intra coding unit (CU) size decision scheme is proposed for 3D-HEVC intra depth map coding based on unsupervised learning approach. First, we treat the early CU size decision as a clustering problem. Then, three clustering models are developed for those CUs with the sizes of 64×64, 32×32 and 16×16 to early determine whether they be further split or not. Among them, the center of the clustering model is obtained by using intra learning method. Finally, in order to meet the user's specific application preference, the similarity distance is introduced into the early CU size decision. By adjusting the similarity distance, a tunable early CU size decision is achieved to obtain different levels of coding complexity reduction. Experimental results show that the proposed scheme achieves encoding time reduction ranging from 54.0% to 68.4% on average for depth map intra coding, while the Bjontegaard delta bit rate (BDBR) of synthesized views only increased by 0.01% to 0.73%, it outperforms the state-of-the-art works in term of coding complexity reduction and BDBR increase.

Introduction

With the rapid development of three-dimensional (3D) video acquisition and display technologies, 3D videos are increasingly popular for viewers. Because 3D videos can provide more immersive and real-world visual experiences, they are widely anticipated in various video applications such as Free-viewpoint TV [1] and 3D movies. 3D videos are commonly presented as the multi-view video plus depth (MVD) format [2], which include multiple texture views and their corresponding depth maps. Meanwhile, vivid virtual views can be synthesized by using Depth-Image-Based-Rendering (DIBR) [3]. Thus, high efficient compression is extremely important for 3D video data to save storage space and transmission bandwidth. To efficiently encode single texture video, High Efficiency Video Compression (HEVC) standard [4]-[5] has been developed by the Joint Collaborative Team on Video Coding (JCT-VC), which achieves higher compression efficiency than earlier H.264/AVC standard. To encode 3D video data, an advanced 3D video coding standard, which is actually a 3D extension of HEVC (3D-HEVC) [6]-[7], is developed by the Joint Collaborative Team on 3D Video Coding (JCT-3V).

3D-HEVC inherits some coding tools such as quadtree Coding Tree Unit (CTU) partitioning of HEVC [8]-[9]. The coding frame is firstly divided into many CTUs with the same size of 64×64. Then, each CTU can be further recursively divided into smaller coding units (CUs). Fig. 1 (a) shows an example of the optimal CTU partitioning. The CU sizes support 64×64, 32×32, 16×16 and 8×8, and their corresponding CU depths are “0”, “1”, “2”, and “3”, respectively. The 3D video format includes multiple texture views and associated depth maps [10], especially for depth map, it is actually a grayscale image in which each pixel represents the geometrical information of the scene. Depth map and texture view are different in nature, since the former is composed of large smooth regions with sharp edges, whereas the latter contains complex content. The edge distortions of depth map coding might also lead to the degradation of synthesized views. To preserve the quality of sharp edges and further improve coding efficiency, in addition to supporting 35 intra prediction modes, as shown in Fig. 1 (b), several new coding techniques are adopted for depth map coding such as Depth Intra Skip (DIS) [11], Depth Modeling Modes (DMM) [12] and Segment-wise Direct Component Coding (SDC) [13]. Especially for the new intra prediction mode DMM, it divides a CU block into two non-rectangular regions, and each region is indicated by a constant value. DMM supports two partition types, wedgelet segmentation and arbitrary contour segmentation, as shown in Fig. 1 (c) and (d).

These advance coding techniques can achieve high compression efficiency for depth map coding in 3D-HEVC, but also lead to huge computational complexity. This brings enormous challenges for real-time applications such as ultra HD video on mobile devices with limited computational resources. Thus, it is necessary to investigate faster coding techniques to reduce the encoding complexity of 3D-HEVC while simultaneously keeping negligible encoding loss.

Though there exist many fast CU size decision works for depth map intra coding to reduce the encoding complexity of 3D-HEVC [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32], [33], [34], [35], the different tradeoff between coding loss and coding time saving could be further investigated due to the following reason. For the user's specific application preference, some application scenarios require lower coding complexity, while other application scenarios may need to reduce coding complexity with coding quality is almost lossless simultaneously. In this work, to meet the user's specific application preference, we propose a tunable early CU size decision scheme for depth map intra coding in 3D-HEVC to achieve different tradeoff between coding loss and coding time saving. The novelties and the contributions of the proposed scheme are summarized as follows:

  • A simple yet effective clustering based early CU size decision approach is proposed for 3D-HEVC depth map intra coding, in which only one valid feature is selected.

  • It is an unsupervised learning method that adaptively updates the cluster center in each coding frame to adapt to the texture characteristics of different coding frames.

  • To meet users' preference for specific application, tunable trade-off between coding time saving and coding loss is achieved by introducing the similarity distance.

The rest of this paper is organized as follows. Section 2 reviews the related works on coding time reduction for 3D-HEVC depth map intra coding. Section 3 describes the motivation and statistical analyses. Section 4 presents the early intra CU size decision scheme for 3D-HEVC depth map coding. Experimental results are provided in Section 5. Section 6 concludes this paper.

Section snippets

Related works

Many fast intra coding algorithms have been designed for HEVC [36], [37], [38]. Though they can greatly reduce the intra coding complexity of HEVC, they are not appropriate for depth map intra coding in 3D-HEVC. The reasons behind this are two-folds. Firstly, texture view and depth map have different content properties. Secondly, distinct coding tools such as DMM and SDC are introduced into depth map intra coding. Therefore, to reduce the complexity of depth map intra coding, many researchers

Motivation and statistical analyses

To analyze the characteristic of depth map intra coding in 3D-HEVC, two preliminary experiments are conducted with the unmodified HTM16.0 under All-Intra (AI) configuration. The test conditions are summarized as follows: four video sequences with different spatial resolutions are used, which include 1024×768 (“Balloons”, “Newspaper”) and 1920×1088 (“Poznan_Hall2” and “Shark”). Four pairs of quantization parameters (QPs) are (25, 34), (30, 39), (35, 42) and (40, 45) for them, respectively. Note

Proposed early CU size decision algorithm

Early intra CU size decision is usually regarded as a classification problem, and supervised machine learning is used for this purpose. However, few works treat early intra CU size decision as a clustering problem. Actually, clustering can be regarded as an unsupervised machine learning method, which can divide a set of data into one cluster or multiple clusters according to the similarity characteristics of the data. In this work, we treat the early CU size decision as a clustering problem,

Test conditions

To evaluate the proposed tunable CU size decision method for depth map intra coding in 3D-HEVC, the proposed scheme is integrated into the 3D-HEVC reference software HTM16.0. We use the CTC [42] and eight test sequences recommended by JCT-3V for experiments. The executable files can be downloaded via the link.1 The details of eight test sequences are reported in Table 1. Note that each test sequence contains three texture views and their corresponding depth

Conclusion

In this paper, unsupervised learning based tunable early CU size decision scheme is proposed for depth map intra coding in 3D-HEVC. Three clustering models are proposed for the clustering of 64×64, 32×32 and 16×16 CU. In the clustering process, only RD cost is extracted as a feature to represent the characteristics of the data. In addition, the center of clusters is adaptively obtained by using intra learning method in each coding frame. In order to meet the user's specific application

CRediT authorship contribution statement

Yue Li: Conceptualization, Methodology, Writing-Original draft preparation. Gaobo Yang: Supervision, Writing-Reviewing. Aiping Qu: Data curation. Yapei Zhu: Validation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (No. 62001209, 61972143), the Natural Science Foundation of Hunan Province, China (No. 2020JJ5496).

Yue Li received the M.S. and Ph.D. degrees from Central South University and Hunan University in 2013 and 2018, respectively. He is a lecturer in the Computer School, University of South China. His current research interests include video coding, point cloud compression.

References (43)

  • G.J. Sullivan et al.

    Overview of the high efficiency video coding (HEVC) standard

    IEEE Trans. Circuits Syst. Video Technol.

    (2012)
  • G. Tech et al.

    Overview of the multiview and 3D extensions of high efficiency video coding

    IEEE Trans. Circuits Syst. Video Technol.

    (2016)
  • L. Shen et al.

    A 3D-HEVC fast mode decision algorithm for real-time applications

    ACM Trans. Multimed. Comput. Commun. Appl.

    (2015)
  • Z. Pan et al.

    Frame-level bit allocation optimization based on video content characteristics for HEVC

    ACM Trans. Multimed. Comput. Commun. Appl.

    (2020)
  • Y. Li et al.

    Adaptive inter CU depth decision for HEVC using optimal selection model and encoding parameters

    IEEE Trans. Broadcast.

    (2017)
  • J. Lee et al.

    3D-CE1: depth intra skip (DIS) mode

  • P. Merkle et al.

    Depth intra coding for 3D video based on geometric primitives

    IEEE Trans. Circuits Syst. Video Technol.

    (2015)
  • H. Liu et al.

    Generic segment-wise DC for 3D-HEVC depth intra coding

  • Q. Zhang et al.

    Fast mode decision based on gradient information in 3D-HEVC

    IEEE Access

    (2019)
  • Z. Gu et al.

    Fast depth modeling mode selection for 3D HEVC depth intra coding

  • R. Zhang et al.

    Fast intra mode decision for depth map coding in 3D-HEVC

    J. Real-Time Image Process.

    (2020)
  • Yue Li received the M.S. and Ph.D. degrees from Central South University and Hunan University in 2013 and 2018, respectively. He is a lecturer in the Computer School, University of South China. His current research interests include video coding, point cloud compression.

    Gaobo Yang is a professor in the School of Information Science and Engineering, Hunan University, China. He obtained his Masters and Ph.D. degrees from East China Jiaotong University and Shanghai University in 2001 and 2004, respectively. From August 2010 to August 2011, he made an academic visit to the University of Surrey, UK. He has published more than 60 papers in international journals. His research interests include video information security and multimedia communication.

    Aiping Qu was born in Hunan, China, in 1982. He received the B.S. and M.S. degrees in mathematics science from Hunan University, in 2005 and 2010, respectively, and the Ph.D. degree in computer science from Wuhan University, in 2015. He is currently an Associate Professor of computer science with the University of South China. He has published over 20 papers in journals and conferences. His research interests include optimization, machine learning, and medical image analysis.

    Yapei Zhu received the B.S. degree from Hengyang Normal University, China, in 2010 and the M.S.degree from Ningbo University in 2013. She is currently a Lecturer with Hengyang Normal University. Her research interest concentrates on image/video compression.

    View full text