Abstract
The latest video coding standard, versatile video coding (VVC), achieves almost twice coding efficiency compared to its predecessor, the high efficiency video coding (HEVC). However, achieving this efficiency (for intra coding) requires 31 × computational complexity compared to HEVC, which makes it challenging for low power and real-time applications. This paper, proposes a novel machine learning approach that jointly and separately employs two modalities of features, to simplify the intra coding decision. To do so, first a set of features are extracted that use the existing DCT core of VVC, to assess the texture characteristics, and forms the first modality of data. This produces high-quality features with almost no extra computational overhead. The distribution of intra modes at the neighboring blocks is also used to form the second modality of data, which provides statistical information about the frame, unlike the first modality. Second, a two-step feature reduction method is designed that reduces the size of feature set, such that a lightweight model with a limited number of parameters can be used to learn the intra mode decision task. Third, three separate training strategies are proposed (1) an offline training strategy using the first (single) modality of data, (2) an online training strategy that uses the second (single) modality, and (3) a mixed online–offline strategy that uses bimodal learning. Finally, a low-complexity encoding algorithms is proposed based on the proposed learning strategies. Extensive experimental results show that the proposed methods can reduce up to 24% of encoding time, with a negligible loss of coding efficiency. Moreover, it is demonstrated how a bimodal learning strategy can boost the performance of learning. Lastly, the proposed method has a very low computational overhead (0.2%), and uses existing components of a VVC encoder, which makes it much more practical compared to competing solutions.
Similar content being viewed by others
References
ITU-T and ISO/IEC JTC 1. Versatile video coding, ITU-T H.266 and ISO/IEC 23090-3 (VVC) (2020)
Sullivan, G.J., Ohm, J., Han, W., Wiegand, T.: Overview of the high efficiency video coding. IEEE Trans. Circuits Syst. Video Technol. 22, 1649–1668 (2012)
Bossen, F., Boyce, J., Suehring, K., Li, X., Seregin, V.: JVET common test conditions and software reference configurations for SDR video. Jt. Video Expert. Team ITU-T SG 16 WP 3 ISO/IEC JTC 1/SC 29/WG 11, 14th meeting (2019)
Pakdaman, F., Adelimanesh, M.A., Gabbouj, M., Hashemi, M.R.: Complexity analysis of next-generation VVC encoding and decoding. In: IEEE International Conference on Image Processing (ICIP), pp. 3134–3138 (2020)
Ozer, J.: Which codecs does YouTube use? https://streaminglearningcenter.com/codecs/which-codecs-does-youtube-use.html. Accessed 15 Jan 2022
ISO/IEC JTC 1. High effic. coding media delivery heterogeneous environment—Part 12 image file format, ISO/IEC 23008-122017 (2017)
Xu, L., Kwong, S., Zhang, Y., Zhao, D.: Low-complexity encoder framework for window-level rate control optimization. IEEE Trans. Ind. Electron. 60, 1850–1858 (2013)
Alaoui-Fdili, O., Fakhri, Y., Corlay, P., Coudoux, F.-X., Aboutajdine, D.: Energy consumption analysis and modelling of a H.264/AVC intra-only based encoder dedicated to WVSNs. In: IEEE International Conference on Image Processing. IEEE, pp. 1189–1193 (2014)
Apple ProRes. https://apple.com/final-cut-pro/docs/Apple_ProRes_White_Paper.pdf
Chen, J., Ye, Y., Kim, S.H.: Algorithm description for versatile video coding and test model 7 (VTM 7). Jt. Video Expert. Team ITU-T SG 16 WP 3 ISO/IEC JTC 1/SC 29/WG 11 16th Meeting. Geneva, Oct. 2019 (2019)
Pakdaman, F., Yu, L., Hashemi, M.R., Ghanbari, M., Gabbouj, M.: SVM based approach for complexity control of HEVC intra coding. Signal Process. Image Commun. 93, 116177 (2021)
Laude, T., Ostermann, J.: Deep learning-based intra prediction mode decision for HEVC. In: Picture Coding Symposium (PCS) (2017)
Dong, X., Shen, L., Yu, M., Yang, H.: Fast intra mode decision algorithm for versatile video coding. IEEE Trans. Multim. 24, 400–414 (2022)
Zhang, T., Sun, M.T., Zhao, D., Gao, W.: Fast intra-mode and CU size decision for HEVC. IEEE Trans. Circuits Syst. Video Technol. 27, 1714–1726 (2017)
Chen, Z., Shi, J., Li, W.: Learned fast HEVC intra coding. IEEE Trans. Image Process. 29, 5431–5446 (2020)
Zhu, L., Zhang, Y., Pan, Z., Wang, R., Kwong, S., Peng, Z.: Binary and multi-class learning based low complexity optimization for HEVC encoding. IEEE Trans. Broadcast. 63, 547–561 (2017)
Hosseini, E., Pakdaman, F., Hashemi, M.R., Ghanbari, M.: A computationally scalable fast intra coding scheme for HEVC video encoder. Multim. Tools Appl. 78, 11607–11630 (2019)
Baltrusaitis, T., Ahuja, C., Morency, L.P.: Multimodal machine learning: a survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41, 423–443 (2019)
Pakdaman, F.: Complexity reduction and control techniques for power-constrained video coding. Tampere University (2020)
Usman, M., Khan, K., Shafique, M., Henkel, J.: An adaptive complexity reduction scheme with fast prediction unit decision for HEVC intra encoding. In: IEEE international conference on image processing. pp. 1578–1582 (2013)
Liu, X., Li, Y., Liu, D., Wang, P., Yang, L.T.: An adaptive CU size decision algorithm for HEVC intra prediction based on complexity classification using machine learning. IEEE Trans. Circuits Syst. Video Technol. 29, 144–155 (2019)
Min, B., Cheung, R.C.C.: A fast CU size decision algorithm for the HEVC intra encoder. IEEE Trans. Circuits Syst. Video Technol. 25, 892–896 (2015)
Grellert, M., Zatt, B., Bampi, S., Cruz, L.A.S.: Fast coding unit partition decision for HEVC using support vector machines. IEEE Trans. Circuits Syst. Video Technol. 29, 1741–1753 (2019)
Zhang, Q., Guo, R., Jiang, B., Su, R.: Fast CU decision-making algorithm based on DenseNet network for VVC. IEEE Access. 9, 119289–119297 (2021)
Nami, S., Pakdaman, F., Hashemi, M.R.: Juniper: a JND-based perceptual video coding framework to jointly utilize saliency and JND. In: IEEE International Conference on Multimedia and Expo Workshops. pp. 1–6 (2020)
Zhao, J., Cui, T., Zhang, Q.: Fast CU partition decision strategy based on human visual system perceptual quality. IEEE Access. 9, 123635–123647 (2021)
Xu, M., Li, T., Wang, Z., Deng, X., Yang, R., Guan, Z.: Reducing complexity of HEVC: a deep learning approach. IEEE Trans. Image Process. 27, 5044–5059 (2018)
Tissier, A., Hamidouche, W., Vanne, J., Galpin, F., Menard, D.: CNN oriented complexity reduction of VVC intra encoder. In: Proceedings of International Conference Image Process. ICIP. 2020-Octob. pp. 3139–3143 (2020)
Tech, G., Pfaff, J., Schwarz, H., Helle, P., Wieckowski, A., Marpe, D., Wiegand, T.: CNN-based parameter selection for fast VVC intra-picture encoding. In: IEEE International Conference on Image Processing. pp. 2109–2113 (2021)
Cao, J., Tang, N., Wang, J., Liang, F.: Texture-based fast CU size decision and intra mode decision algorithm for VVC. In: Lectures Notes in Computer Science. pp. 739–751 (2020)
Zhang, Q., Wang, Y., Huang, L., Jiang, B.: Fast CU partition and intra mode decision method for H.266/VVC. IEEE Access 8, 117539–117550 (2020)
Yao, Y., Wang, J., Du, C., Zhu, J., Xu, X.: A support vector machine based fast planar prediction mode decision algorithm for versatile video coding. Multim. Tools Appl. 2022, 1–18 (2022)
Yang, S.H., Hsiao, S.J.: H.266/VVC fast intra prediction using Sobel edge features. Electron. Lett. 57, 11–13 (2021)
Lei, J., Li, D., Pan, Z., Sun, Z., Kwong, S., Hou, C.: Fast intra prediction based on content property analysis for low complexity HEVC-based screen content coding. IEEE Trans. Broadcast. 63, 48–58 (2017)
Saldanha, M., Sanchez, G., Marcon, C., Agostini, L.: Learning-based complexity reduction scheme for VVC intra-frame prediction. In: International Conference on Visual Communications and Image Processing. pp. 1–5 (2021)
Pakdaman, F., Hashemi, M.-R., Ghanbari, M.: Fast and efficient intra mode decision for HEVC, based on dual-tree complex wavelet. Multim. Tools Appl. 76, 9891–9906 (2017)
Jamali, M., Coulombe, S.: Fast HEVC intra mode decision based on RDO cost prediction. IEEE Trans. Broadcast. 65, 109–122 (2018)
Hosseini, E., Pakdaman, F., Hashemi, M.R., Ghanbari, M.: Fine-grain complexity control of HEVC intra prediction in battery-powered video codecs. J. Real-Time Image Process. 18, 03–618 (2021)
Ding, W., Shen, W., Shi, Y., Yin, B.: A fast intra-mode decision scheme for HEVC. In: Proceedings—2014 International Conference on Digital Home, ICDH 2014. pp. 70–73 (2014)
Shang, X., Wang, G., Fan, T., Li, Y.: Fast CU size decision and PU mode decision algorithm in HEVC intra coding. In: International Conference on Image Processing. pp. 1593–1597 (2015)
Ben Jdidia, S., Belghith, F., Sallem, A., Jridi, M., Masmoudi, N.: Hardware implementation of PSO-based approximate DST transform for VVC standard. J. Real-Time Image Process. 2021, 1–15 (2021)
Ryu, S., Kang, J.: Machine learning-based fast angular prediction mode decision technique in video coding. IEEE Trans. Image Process. 27, 5525–5538 (2018)
Yao, Y., Li, X., Lu, Y.: Fast intra mode decision algorithm for HEVC based on dominant edge assent distribution. Multim. Tools Appl. 75, 1963–1981 (2016)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning requires rethinking generalization. In: ICLR. pp. 1–11 (2016)
Raufmehr, F., Salehi, M.R., Abiri, E.: A frame-level MLP-based bit-rate controller for real-time video transmission using VVC standard. J. Real-Time Image Process. 18, 751–763 (2020)
Møller, M.F.: A scaled conjugate gradient algorithm for fast supervised learning. Neural Netw. 6, 525–533 (1993)
Bjontegaard, G.: Calculation of average PSNR differences between RD-curves. ITU-T Q.6/SG16, Doc. VCEG-M33, 15th Meeting. Austin, Texas (2001)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Pakdaman, F., Adelimanesh, M. & Hashemi, M. BLINC: lightweight bimodal learning for low-complexity VVC intra-coding. J Real-Time Image Proc 19, 791–807 (2022). https://doi.org/10.1007/s11554-022-01223-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-022-01223-1