ABSTRACT
Conventional video summarization approaches based on reinforcement learning have the problem that the reward can only be received after the whole summary is generated. Such kind of reward is sparse and it makes reinforcement learning hard to converge. Another problem is that labelling each shot is tedious and costly, which usually prohibits the construction of large-scale datasets. To solve these problems, we propose a weakly supervised hierarchical reinforcement learning framework, which decomposes the whole task into several subtasks to enhance the summarization quality. This framework consists of a manager network and a worker network. For each subtask, the manager is trained to set a subgoal only by a task-level binary label, which requires much fewer labels than conventional approaches. With the guide of the subgoal, the worker predicts the importance scores for video shots in the subtask by policy gradient according to both global reward and innovative defined sub-rewards to overcome the sparse problem. Experiments on two benchmark datasets show that our proposal has achieved the best performance, even better than supervised approaches.
- Sijia Cai, Wangmeng Zuo, Larry S Davis, and Lei Zhang. 2018. Weakly-supervised video summarization using variational encoder-decoder and web prior. In ECCV. 184--200.Google Scholar
- Sandra Eliza Fontes De Avila, Ana Paula Brandão Lopes, Antonio da Luz Jr, and Arnaldo de Albuquerque Araújo. 2011. VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recognition Letters, 56--68.Google Scholar
- Ryosuke Furuta, Naoto Inoue, and Toshihiko Yamasaki. 2019. Fully convolutional network with multi-step reinforcement learning for image processing. In AAAI. 3598--3605.Google Scholar
- Michael Gygli, Helmut Grabner, Hayko Riemenschneider, and Luc Van Gool. 2014. Creating summaries from user videos. In ECCV. 505--520.Google Scholar
- Michael Gygli, Helmut Grabner, and Luc Van Gool. 2015. Video summarization by learning submodular mixtures of objectives. In CVPR. 3090--3098.Google Scholar
- Ji He, Jianshu Chen, Xiaodong He, Jianfeng Gao, Lihong Li, Li Deng, and Mari Ostendorf. 2016. Deep Reinforcement Learning with a Natural Language Action Space. In ACL. 1621--1630.Google Scholar
- Chen Huang, Simon Lucey, and Deva Ramanan. 2017. Learning policies for adaptive tracking with deep feature cascades. In ICCV. 105--114.Google Scholar
- M. G. Kendall. 1945. The Treatment of Ties in Ranking Problems. Biometrika 33, 3 (1945), 239--251. http://www.jstor.org/stable/2332303Google ScholarCross Ref
- Tejas D Kulkarni, Karthik Narasimhan, Ardavan Saeedi, and Josh Tenenbaum. 2016. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. In NeurIPS. 3675--3683.Google Scholar
- Behrooz Mahasseni, Michael Lam, and Sinisa Todorovic. 2017. Unsupervised video summarization with adversarial lstm networks. In CVPR. 202--211.Google Scholar
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing Atari With Deep Reinforcement Learning. In NeurIPS Workshop.Google Scholar
- Karthik Narasimhan, Adam Yala, and Regina Barzilay. 2016. Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning. In EMNLP. 2355--2365.Google Scholar
- Mayu Otani, Yuta Nakashima, Esa Rahtu, and Janne Heikkila. 2019. Rethinking the Evaluation of Video Summaries. In CVPR. 7596--7604.Google Scholar
- Danila Potapov, Matthijs Douze, Zaid Harchaoui, and Cordelia Schmid. 2014. Category-specific video summarization. In ECCV. 540--555.Google Scholar
- Liangliang Ren, Xin Yuan, Jiwen Lu, Ming Yang, and Jie Zhou. 2018. Deep reinforcement learning with iterative shift for visual tracking. In ECCV. 684--700.Google Scholar
- Yale Song, Jordi Vallmitjana, Amanda Stent, and Alejandro Jaimes. 2015. Tvsum: Summarizing web videos using titles. In CVPR. 5179--5187.Google Scholar
- Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, and Koray Kavukcuoglu. 2017. Feudal networks for hierarchical reinforcement learning. In ICML. 3540--3549.Google Scholar
- Ronald J. Williams. 1992. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Mach. Learn. 8, 3-4 (May 1992), 229--256. https://doi.org/10.1007/BF00992696Google ScholarDigital Library
- Ke Yu, Chao Dong, Liang Lin, and Chen Change Loy. 2018. Crafting a toolchain for image restoration by deep reinforcement learning. In CVPR. 2443--2452.Google Scholar
- Da Zhang, Hamid Maei, Xin Wang, and Yuan-Fang Wang. 2017. Deep reinforcement learning for visual object tracking in videos. arXiv preprint arXiv:1701.08936 (2017).Google Scholar
- Ke Zhang, Wei-Lun Chao, Fei Sha, and Kristen Grauman. 2016. Video summarization with long short-term memory. In ECCV. 766--782.Google Scholar
- Bin Zhao, Xuelong Li, and Xiaoqiang Lu. 2018. Hsa-rnn: Hierarchical structure-adaptive rnn for video summarization. In CVPR. 7405--7414.Google Scholar
- Kaiyang Zhou, Yu Qiao, and Tao Xiang. 2018. Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In AAAI. 7582--7589.Google Scholar
- D. Zwillinger and S. Kokoska. 1999. CRC standard probability and statistics tables and formulae. CRC.Google Scholar
Index Terms
- Weakly Supervised Video Summarization by Hierarchical Reinforcement Learning
Recommendations
Hierarchical Reinforcement Learning: A Comprehensive Survey
Hierarchical Reinforcement Learning (HRL) enables autonomous decomposition of challenging long-horizon decision-making tasks into simpler subtasks. During the past years, the landscape of HRL research has grown profoundly, resulting in copious ...
Automatic Curriculum Generation by Hierarchical Reinforcement Learning
Neural Information ProcessingAbstractCurriculum learning has the potential to solve the problem of sparse rewards, a long-standing challenge in reinforcement learning, with greater sample efficiency than traditional reinforcement learning algorithms because curriculum learning ...
Learning disentangled skills for hierarchical reinforcement learning through trajectory autoencoder with weak labels
AbstractTypically, hierarchical reinforcement learning (RL) requires skills that are applicable to various downstream tasks. Although several recent studies have proposed the supervised and unsupervised learning of such skills, the learned skills are ...
Graphical abstractDisplay Omitted
Highlights- Learning skills as continuous latent variables with disentangled factors.
- Using weak labels to train a trajectory variational autoencoder.
- Skill repetition to expand the whole trajectory.
Comments