skip to main content
10.1145/3338533.3366583acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Weakly Supervised Video Summarization by Hierarchical Reinforcement Learning

Published:10 January 2020Publication History

ABSTRACT

Conventional video summarization approaches based on reinforcement learning have the problem that the reward can only be received after the whole summary is generated. Such kind of reward is sparse and it makes reinforcement learning hard to converge. Another problem is that labelling each shot is tedious and costly, which usually prohibits the construction of large-scale datasets. To solve these problems, we propose a weakly supervised hierarchical reinforcement learning framework, which decomposes the whole task into several subtasks to enhance the summarization quality. This framework consists of a manager network and a worker network. For each subtask, the manager is trained to set a subgoal only by a task-level binary label, which requires much fewer labels than conventional approaches. With the guide of the subgoal, the worker predicts the importance scores for video shots in the subtask by policy gradient according to both global reward and innovative defined sub-rewards to overcome the sparse problem. Experiments on two benchmark datasets show that our proposal has achieved the best performance, even better than supervised approaches.

References

  1. Sijia Cai, Wangmeng Zuo, Larry S Davis, and Lei Zhang. 2018. Weakly-supervised video summarization using variational encoder-decoder and web prior. In ECCV. 184--200.Google ScholarGoogle Scholar
  2. Sandra Eliza Fontes De Avila, Ana Paula Brandão Lopes, Antonio da Luz Jr, and Arnaldo de Albuquerque Araújo. 2011. VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recognition Letters, 56--68.Google ScholarGoogle Scholar
  3. Ryosuke Furuta, Naoto Inoue, and Toshihiko Yamasaki. 2019. Fully convolutional network with multi-step reinforcement learning for image processing. In AAAI. 3598--3605.Google ScholarGoogle Scholar
  4. Michael Gygli, Helmut Grabner, Hayko Riemenschneider, and Luc Van Gool. 2014. Creating summaries from user videos. In ECCV. 505--520.Google ScholarGoogle Scholar
  5. Michael Gygli, Helmut Grabner, and Luc Van Gool. 2015. Video summarization by learning submodular mixtures of objectives. In CVPR. 3090--3098.Google ScholarGoogle Scholar
  6. Ji He, Jianshu Chen, Xiaodong He, Jianfeng Gao, Lihong Li, Li Deng, and Mari Ostendorf. 2016. Deep Reinforcement Learning with a Natural Language Action Space. In ACL. 1621--1630.Google ScholarGoogle Scholar
  7. Chen Huang, Simon Lucey, and Deva Ramanan. 2017. Learning policies for adaptive tracking with deep feature cascades. In ICCV. 105--114.Google ScholarGoogle Scholar
  8. M. G. Kendall. 1945. The Treatment of Ties in Ranking Problems. Biometrika 33, 3 (1945), 239--251. http://www.jstor.org/stable/2332303Google ScholarGoogle ScholarCross RefCross Ref
  9. Tejas D Kulkarni, Karthik Narasimhan, Ardavan Saeedi, and Josh Tenenbaum. 2016. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. In NeurIPS. 3675--3683.Google ScholarGoogle Scholar
  10. Behrooz Mahasseni, Michael Lam, and Sinisa Todorovic. 2017. Unsupervised video summarization with adversarial lstm networks. In CVPR. 202--211.Google ScholarGoogle Scholar
  11. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing Atari With Deep Reinforcement Learning. In NeurIPS Workshop.Google ScholarGoogle Scholar
  12. Karthik Narasimhan, Adam Yala, and Regina Barzilay. 2016. Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning. In EMNLP. 2355--2365.Google ScholarGoogle Scholar
  13. Mayu Otani, Yuta Nakashima, Esa Rahtu, and Janne Heikkila. 2019. Rethinking the Evaluation of Video Summaries. In CVPR. 7596--7604.Google ScholarGoogle Scholar
  14. Danila Potapov, Matthijs Douze, Zaid Harchaoui, and Cordelia Schmid. 2014. Category-specific video summarization. In ECCV. 540--555.Google ScholarGoogle Scholar
  15. Liangliang Ren, Xin Yuan, Jiwen Lu, Ming Yang, and Jie Zhou. 2018. Deep reinforcement learning with iterative shift for visual tracking. In ECCV. 684--700.Google ScholarGoogle Scholar
  16. Yale Song, Jordi Vallmitjana, Amanda Stent, and Alejandro Jaimes. 2015. Tvsum: Summarizing web videos using titles. In CVPR. 5179--5187.Google ScholarGoogle Scholar
  17. Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, and Koray Kavukcuoglu. 2017. Feudal networks for hierarchical reinforcement learning. In ICML. 3540--3549.Google ScholarGoogle Scholar
  18. Ronald J. Williams. 1992. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Mach. Learn. 8, 3-4 (May 1992), 229--256. https://doi.org/10.1007/BF00992696Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Ke Yu, Chao Dong, Liang Lin, and Chen Change Loy. 2018. Crafting a toolchain for image restoration by deep reinforcement learning. In CVPR. 2443--2452.Google ScholarGoogle Scholar
  20. Da Zhang, Hamid Maei, Xin Wang, and Yuan-Fang Wang. 2017. Deep reinforcement learning for visual object tracking in videos. arXiv preprint arXiv:1701.08936 (2017).Google ScholarGoogle Scholar
  21. Ke Zhang, Wei-Lun Chao, Fei Sha, and Kristen Grauman. 2016. Video summarization with long short-term memory. In ECCV. 766--782.Google ScholarGoogle Scholar
  22. Bin Zhao, Xuelong Li, and Xiaoqiang Lu. 2018. Hsa-rnn: Hierarchical structure-adaptive rnn for video summarization. In CVPR. 7405--7414.Google ScholarGoogle Scholar
  23. Kaiyang Zhou, Yu Qiao, and Tao Xiang. 2018. Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In AAAI. 7582--7589.Google ScholarGoogle Scholar
  24. D. Zwillinger and S. Kokoska. 1999. CRC standard probability and statistics tables and formulae. CRC.Google ScholarGoogle Scholar

Index Terms

  1. Weakly Supervised Video Summarization by Hierarchical Reinforcement Learning

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MMAsia '19: Proceedings of the 1st ACM International Conference on Multimedia in Asia
        December 2019
        403 pages
        ISBN:9781450368414
        DOI:10.1145/3338533

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 10 January 2020

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        MMAsia '19 Paper Acceptance Rate59of204submissions,29%Overall Acceptance Rate59of204submissions,29%

        Upcoming Conference

        MM '24
        MM '24: The 32nd ACM International Conference on Multimedia
        October 28 - November 1, 2024
        Melbourne , VIC , Australia

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader