skip to main content
research-article

Exploring Neighbor Correspondence Matching for Multiple-hypotheses Video Frame Synthesis

Published:11 January 2024Publication History
Skip Abstract Section

Abstract

Video frame synthesis, which consists of interpolation and extrapolation, is an essential video processing technique that can be applied to various scenarios. However, most existing methods cannot handle small objects or large motion well, especially in high-resolution videos such as 4K videos. To eliminate such limitations, we introduce a neighbor correspondence matching (NCM) algorithm for flow-based frame synthesis. Since the current frame is not available in video frame synthesis, NCM is performed in a current-frame-agnostic fashion to establish multi-scale correspondences in the spatial-temporal neighborhoods of each pixel. Based on the powerful motion representation capability of NCM, we propose a heterogeneous coarse-to-fine scheme for intermediate flow estimation. The coarse-scale and fine-scale modules are trained progressively, making NCM computationally efficient and robust to large motions. We further explore the mechanism of NCM and find that neighbor correspondence is powerful, since it provides multiple-hypotheses motion information for synthesis. Based on this analysis, we introduce a multiple-hypotheses estimation process for video frame extrapolation, resulting in a more robust framework, NCM-MH. Experimental results show that NCM and NCM-MH achieve 31.63 and 28.08 dB for interpolation and extrapolation on the most challenging X4K1000FPS benchmark, outperforming all the other state-of-the-art methods that use two reference frames as input.

REFERENCES

  1. [1] Bao Wenbo, Lai Wei-Sheng, Ma Chao, Zhang Xiaoyun, Gao Zhiyong, and Yang Ming-Hsuan. 2019. Depth-aware video frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 37033712.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Bao Wenbo, Lai Wei-Sheng, Zhang Xiaoyun, Gao Zhiyong, and Yang Ming-Hsuan. 2019. Memc-net: Motion estimation and motion compensation driven neural network for video interpolation and enhancement. IEEE Trans. Pattern Anal. Mach. Intell. 43, 3 (2019), 933948.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Brooks Tim and Barron Jonathan T. 2019. Learning to synthesize motion blur. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 68406848.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Ho Kei Cheng, Yu-Wing Tai, and Chi-Keung Tang. 2021. Rethinking space-time networks with improved memory coverage for efficient video object segmentation. Advances in Neural Information Processing Systems 34 (2021), 11781–11794.Google ScholarGoogle Scholar
  5. [5] Choi Myungsub, Kim Heewon, Han Bohyung, Xu Ning, and Lee Kyoung Mu. 2020. Channel attention is all you need for video frame interpolation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 1066310671.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Flynn John, Neulander Ivan, Philbin James, and Snavely Noah. 2016. Deepstereo: Learning to predict new views from the world’s imagery. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 55155524.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Hartley Richard and Zisserman Andrew. 2003. Multiple View Geometry in Computer Vision. Cambridge University Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Hu Xiaotao, Huang Zhewei, Huang Ailin, Xu Jun, and Zhou Shuchang. 2023. A dynamic multi-scale voxel flow network for video prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 61216131.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Huang Zhewei, Zhang Tianyuan, Heng Wen, Shi Boxin, and Zhou Shuchang. 2022. Real-time intermediate flow estimation for video frame interpolation. In European Conference on Computer Vision. Springer, 624642.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Hui Tak-Wai and Loy Chen Change. 2020. Liteflownet3: Resolving correspondence ambiguity for more accurate optical flow estimation. In European Conference on Computer Vision. Springer, 169184.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Jia Zhaoyang, Lu Yan, and Li Houqiang. 2022. Neighbor correspondence matching for flow-based video frame synthesis. In Proceedings of the 30th ACM International Conference on Multimedia. 53895397.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Jiang Huaizu, Sun Deqing, Jampani Varun, Yang Ming-Hsuan, Learned-Miller Erik, and Kautz Jan. 2018. Super slomo: High quality estimation of multiple intermediate frames for video interpolation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 90009008.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Jiang Shihao, Campbell Dylan, Lu Yao, Li Hongdong, and Hartley Richard. 2021. Learning to estimate hidden motions with global motion aggregation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 97729781.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Kalantari Nima Khademi, Wang Ting-Chun, and Ramamoorthi Ravi. 2016. Learning-based view synthesis for light field cameras. ACM Trans. Graph. 35, 6 (2016), 110.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Lee Hyeongmin, Kim Taeoh, Chung Tae-young, Pak Daehyun, Ban Yuseok, and Lee Sangyoun. 2020. Adacof: Adaptive collaboration of flows for video frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 53165325.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Li Zhen, Zhu Zuo-Liang, Han Ling-Hao, Hou Qibin, Guo Chun-Le, and Cheng Ming-Ming. 2023. AMT: All-pairs multi-field transforms for efficient frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 98019810.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Liu Jiaying, Xia Sifeng, and Yang Wenhan. 2019. Deep reference generation with multi-domain hierarchical constraints for inter prediction. IEEE Trans. Multimedia 22, 10 (2019), 24972510.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Liu Yihao, Xie Liangbin, Siyao Li, Sun Wenxiu, Qiao Yu, and Dong Chao. 2020. Enhanced quadratic video interpolation. In European Conference on Computer Vision. Springer, 4156.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Liu Ziwei, Yeh Raymond A, Tang Xiaoou, Liu Yiming, and Agarwala Aseem. 2017. Video frame synthesis using deep voxel flow. In Proceedings of the IEEE International Conference on Computer Vision. 44634471.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Loshchilov Ilya and Hutter Frank. 2018. Fixing weight decay regularization in adam. (2018).Google ScholarGoogle Scholar
  21. [21] Lu Guo, Ouyang Wanli, Xu Dong, Zhang Xiaoyun, Cai Chunlei, and Gao Zhiyong. 2019. Dvc: An end-to-end deep video compression framework. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1100611015.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Lu Liying, Wu Ruizheng, Lin Huaijia, Lu Jiangbo, and Jia Jiaya. 2022. Video frame interpolation with transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 35323542.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Niklaus Simon and Liu Feng. 2018. Context-aware synthesis for video frame interpolation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 17011710.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Niklaus S. and Liu Feng. 2020. Softmax splatting for video frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 54375446.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Niklaus Simon, Mai Long, and Liu Feng. 2017. Video frame interpolation via adaptive separable convolution. In Proceedings of the IEEE International Conference on Computer Vision. 261270.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Park Junheum, Ko Keunsoo, Lee Chul, and Kim Chang-Su. 2020. Bmbc: Bilateral motion estimation with bilateral cost volume for video interpolation. In European Conference on Computer Vision. Springer, 109125.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Park Junheum, Lee Chul, and Kim Chang-Su. 2021. Asymmetric bilateral motion estimation for video frame interpolation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1453914548.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Pourreza Reza and Cohen Taco. 2021. Extending neural p-frame codecs for b-frame coding. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 66806689.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Ranjan Anurag and Black Michael J. 2017. Optical flow estimation using a spatial pyramid network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 41614170.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Schonberger Johannes L and Frahm Jan-Michael. 2016. Structure-from-motion revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 41044113.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Xihua Sheng, Jiahao Li, Bin Li, Li Li, Dong Liu, and Yan Lu. 2023. Temporal context mining for learned video compression. IEEE Transactions on Multimedia 25 (2023), 7311–7322.Google ScholarGoogle Scholar
  32. [32] Shi Zhihao, Liu Xiaohong, Shi Kangdi, Dai Linhui, and Chen Jun. 2021. Video frame interpolation via generalized deformable convolution. IEEE Trans. Multimedia 24 (2021), 426439.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Sim Hyeonjun, Oh Jihyong, and Kim Munchurl. 2021. XVFI: Extreme video frame interpolation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1448914498.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Soomro Khurram, Zamir Amir Roshan, and Shah Mubarak. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv:1212.0402. Retrieved from https://arxiv.org/abs/1212.0402.Google ScholarGoogle Scholar
  35. [35] Sullivan Gary J, Ohm Jens-Rainer, Han Woo-Jin, and Wiegand Thomas. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circ. Syst. Vid. Technol. 22, 12 (2012), 16491668.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Sun Deqing, Yang Xiaodong, Liu Ming-Yu, and Kautz Jan. 2018. Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 89348943.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Teed Zachary and Deng Jia. 2020. Raft: Recurrent all-pairs field transforms for optical flow. In European Conference on Computer Vision. Springer, 402419.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Wu Yue, Wen Qiang, and Chen Qifeng. 2022. Optimizing video prediction via video frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1781417823.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Jing Xiao, Kangmin Xu, Mengshun Hu, Liang Liao, Zheng Wang, Chia-Wen Lin, Mi Wang, and Shin’ichi Satoh. 2022. Progressive motion boosting for video frame interpolation. IEEE Transactions on Multimedia (2022), 1–14.Google ScholarGoogle Scholar
  40. [40] Xue Tianfan, Chen Baian, Wu Jiajun, Wei Donglai, and Freeman William T.. 2019. Video enhancement with task-oriented flow. Int. J. Comput. Vis. 127, 8 (2019), 11061125.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Gengshan Yang and Deva Ramanan. 2019. Volumetric correspondence networks for optical flow. Advances in Neural Information Processing Systems 32 (2019), 794–805.Google ScholarGoogle Scholar
  42. [42] Zongxin Yang, Yunchao Wei, and Yi Yang. 2021. Collaborative video object segmentation by multi-scale foreground-background integration. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 9 (2021), 4701–4712.Google ScholarGoogle Scholar
  43. [43] Zhang Guozhen, Zhu Yuhan, Wang Haonan, Chen Youxin, Wu Gangshan, and Wang Limin. 2023. Extracting motion and appearance via inter-frame attention for efficient video frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 56825692.Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Zhao Shiyu, Zhao Long, Zhang Zhixing, Zhou Enyu, and Metaxas Dimitris. 2022. Global matching with overlapping attention for optical flow estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1759217601.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Exploring Neighbor Correspondence Matching for Multiple-hypotheses Video Frame Synthesis

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 4
      April 2024
      676 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3613617
      • Editor:
      • Abdulmotaleb El Saddik
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 January 2024
      • Online AM: 23 November 2023
      • Accepted: 11 November 2023
      • Revised: 10 November 2023
      • Received: 11 May 2023
      Published in tomm Volume 20, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
    • Article Metrics

      • Downloads (Last 12 months)2,841
      • Downloads (Last 6 weeks)23

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text