skip to main content
10.1145/3459637.3482066acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

Boosting Few-shot Abstractive Summarization with Auxiliary Tasks

Published:30 October 2021Publication History

ABSTRACT

For summarization in niche domains, data is not enough to fine-tune the large pre-trained model. In order to alleviate the few-shot problem, we design several auxiliary tasks to assist the main task---abstractive summarization. In this paper, we employ BART as the base sequence-to-sequence model and incorporate the main and auxiliary tasks under the multi-task framework. We transform all the tasks in the format of machine reading comprehension [19]. Moreover, we utilize the task-specific adapter to effectively share knowledge across tasks and the adaptive weight mechanism to adjust the contribution of auxiliary tasks to the main task. Experiments show the effectiveness of our method for few-shot datasets. We also propose to firstly pre-train the model on unlabeled datasets, and the methods proposed in this paper can further improve the model performance.

Skip Supplemental Material Section

Supplemental Material

cikm21-rgsp0441.mp4

mp4

10.5 MB

References

  1. Gabor Angeli, Melvin Jose Johnson Premkumar, and Christopher D. Manning. 2015. Leveraging Linguistic Structure For Open Domain Information Extraction. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Beijing, China, 344--354. https://doi.org/10.3115/v1/P15--1034Google ScholarGoogle Scholar
  2. Christos Baziotis, Ion Androutsopoulos, Ioannis Konstas, and Alexandros Potami- anos. 2019. SEQ 3: Differentiable Sequence-to-Sequence-to-Sequence Autoen-coder for Unsupervised Abstractive Sentence Compression. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 673--681. https://doi.org/10.18653/v1/N19-1071Google ScholarGoogle Scholar
  3. Sagie Benaim and Lior Wolf. 2018. One-Shot Unsupervised Cross Domain Translation. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Vol. 31. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2018/file/062ddb6c727310e76b6200b7c71f63b5-Paper.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Arthur Bra?inskas, Mirella Lapata, and Ivan Titov. 2020. Unsupervised Opinion Summarization as Copycat-Review Generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 5151--5169. https://doi.org/10.18653/v1/2020.acl- main.461Google ScholarGoogle Scholar
  5. Y. Chen, Y. Ma, Xudong Mao, and Q. Li. 2019. Multi-Task Learning for Abstractive and Extractive Summarization. Data Science and Engineering 4 (2019), 14--23.Google ScholarGoogle Scholar
  6. Eric Chu and Peter J. Liu. 2018. MeanSum: A Neural Model for Unsupervised Multi-document Abstractive Summarization. arXiv e-prints, Article arXiv:1810.05739 (Oct. 2018), arXiv:1810.05739 pages. arXiv:1810.05739 [cs.CL]Google ScholarGoogle Scholar
  7. Rotem Dror, Gili Baumer, Segev Shlomov, and Roi Reichart. 2018. The Hitchhiker's Guide to Testing Statistical Significance in Natural Language Processing. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, 1383--1392. https://doi.org/10.18653/v1/P18-1128Google ScholarGoogle ScholarCross RefCross Ref
  8. Alexander R. Fabbri, Simeng Han, Haoyuan Li, Haoran Li, Marjan Ghazvininejad, Shafiq Joty, Dragomir Radev, and Yashar Mehdad. 2020. Improving Zero and Few-Shot Abstractive Summarization with Intermediate Fine-tuning and Data Augmentation. arXiv:2010.12836 [cs.CL]Google ScholarGoogle Scholar
  9. Alexander R. Fabbri, Wojciech Kryścińki, Bryan McCann, Caiming Xiong, Richard Socher, and Dragomir Radev. 2021. SummEval: Re-evaluating Summarization Evaluation. Transactions of the Association for Computational Linguistics 9 (04 2021), 391--409. https://doi.org/10.1162/tacl_a_00373 arXiv:https://direct.mit.edu/tacl/article-pdf/doi/10.1162/tacl_a_00373/1923949/tacl_a_00373.pdfGoogle ScholarGoogle Scholar
  10. Travis Goodwin, Max Savery, and Dina Demner-Fushman. 2020. Towards Zero-Shot Conditional Summarization with Adaptive Multi-Task Fine-Tuning. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online, 3215--3226. https://www.aclweb.org/anthology/2020.findings-emnlp.289Google ScholarGoogle Scholar
  11. Max Grusky, Mor Naaman, and Yoav Artzi. 2018. Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, New Orleans, Louisiana, 708--719. https://doi.org/10.18653/v1/N18-1065Google ScholarGoogle ScholarCross RefCross Ref
  12. Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin de Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. Parameter-Efficient Transfer Learning for NLP. arXiv:1902.00751 [cs.LG]Google ScholarGoogle Scholar
  13. Zikun Hu, Xiang Li, Cunchao Tu, Zhiyuan Liu, and Maosong Sun. 2018. Few-Shot Charge Prediction with Discriminative Legal Attributes. In Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics, Santa Fe, New Mexico, USA, 487--498. https://www.aclweb.org/anthology/C18--1041Google ScholarGoogle Scholar
  14. Luyang Huang, Lingfei Wu, and Lu Wang. 2020. Knowledge Graph-Augmented Abstractive Summarization with Semantic-Driven Cloze Reward. In Proceed- ings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 5094--5107. https://doi.org/10.18653/v1/2020.acl-main.457Google ScholarGoogle Scholar
  15. Yuge Huang, Yuhan Wang, Ying Tai, Xiaoming Liu, Pengcheng Shen, Shaoxin Li, Jilin Li, and Feiyue Huang. 2020. CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition. arXiv:2004.00288 [cs.CV]Google ScholarGoogle Scholar
  16. Masaru Isonuma, Toru Fujino, Junichiro Mori, Yutaka Matsuo, and Ichiro Sakata. 2017. Extractive Summarization Using Multi-Task Learning with Document Classification. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Copenhagen, Denmark, 2101--2110. https://doi.org/10.18653/v1/D17--1223Google ScholarGoogle ScholarCross RefCross Ref
  17. Wojciech Kryscinski, B. McCann, Caiming Xiong, and R. Socher. 2020. Evaluating the Factual Consistency of Abstractive Text Summarization. ArXiv abs/1910.12840 (2020).Google ScholarGoogle Scholar
  18. Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 7871--7880. https://doi.org/10.18653/v1/2020.acl-main.703Google ScholarGoogle ScholarCross RefCross Ref
  19. Kaixuan Li, Xiujuan Xian, Jiafu Wang, and Niannian Yu. 2019. First-principle study on honeycomb fluorated-InTe monolayer with large Rashba spin splitting and direct bandgap. Applied Surface Science 471 (Mar 2019), 18--22. https://doi.org/10.1016/j.apsusc.2018.11.214Google ScholarGoogle Scholar
  20. Xingyu Lin, Harjatin Baweja, George Kantor, and David Held. 2019. Adaptive Auxiliary Task Weighting for Reinforcement Learning. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc., 4772--4783. https://proceedings.neurips.cc/paper/2019/file/0e900ad84f63618452210ab8baae0218-Paper.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Yang Liu and Mirella Lapata. 2019. Text Summarization with Pretrained Encoders. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3730--3740. https://doi.org/10.18653/v1/D19-1387Google ScholarGoogle Scholar
  22. Christopher Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, Baltimore, Maryland, 55--60. https://doi.org/10.3115/v1/P14-5010Google ScholarGoogle ScholarCross RefCross Ref
  23. Bryan McCann, Nitish Shirish Keskar, Caiming Xiong, and Richard Socher. 2018. The Natural Language Decathlon: Multitask Learning as Question Answering. CoRR abs/1806.08730 (2018). arXiv:1806.08730 http://arxiv.org/abs/1806.08730Google ScholarGoogle Scholar
  24. Anshuman Mishra, Dhruvesh Patel, Aparna Vijayakumar, Xiang Li, Pavan Ka- panipathi, and Kartik Talamadupula. 2020. Reading Comprehension as Natural Language Inference: A Semantic Analysis. arXiv:2010.01713 [cs.CL]Google ScholarGoogle Scholar
  25. Saeid Motiian, Quinn Jones, Seyed Iranmanesh, and Gianfranco Doretto. 2017. Few-Shot Adversarial Domain Adaptation. In Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc., 6670--6680. https://proceedings.neurips.cc/paper/2017/file/21c5bba1dd6aed9ab48c2b34c1a0adde-Paper.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Ramesh Nallapati, Feifei Zhai, and Bowen Zhou. 2017. SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (San Francisco, California, USA) (AAAI'17). AAAI Press, 3075--3081. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ramesh Nallapati, Bowen Zhou, Cicero dos Santos, Çağlar Gulçehre, and Bing Xiang. 2016. Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond. In Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning. Association for Computational Linguistics, Berlin, Germany, 280--290. https://doi.org/10.18653/v1/K16--1028Google ScholarGoogle ScholarCross RefCross Ref
  28. Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli. 2019. fairseq: A Fast, Extensible Toolkit for Sequence Modeling. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations). Association for Computational Linguistics, Minneapolis, Minnesota, 48--53. https://doi.org/10.18653/v1/N19-4009Google ScholarGoogle ScholarCross RefCross Ref
  29. Gabriel Pereyra, George Tucker, Jan Chorowski, Łukasz Kaiser, and Geoffrey Hinton. 2017. Regularizing Neural Networks by Penalizing Confident Output Distributions. arXiv:1701.06548 [cs.NE]Google ScholarGoogle Scholar
  30. Jonas Pfeiffer, Aishwarya Kamath, Andreas Rücklé, Kyunghyun Cho, and Iryna Gurevych. 2020. AdapterFusion: Non-Destructive Task Composition for Transfer Learning. arXiv:2005.00247 [cs.CL]Google ScholarGoogle Scholar
  31. Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research 21, 140 (2020), 1--67. http://jmlr.org/papers/v21/20-074.htmlGoogle ScholarGoogle Scholar
  32. Sebastian Ruder. 2017. An Overview of Multi-Task Learning in Deep Neural Networks. arXiv:1706.05098 [cs.LG]Google ScholarGoogle Scholar
  33. Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get To The Point: Summarization with Pointer-Generator Networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada, 1073--1083. https://doi.org/10.18653/v1/P17-1099Google ScholarGoogle Scholar
  34. Trevor Standley, Amir R. Zamir, Dawn Chen, Leonidas Guibas, Jitendra Malik, and Silvio Savarese. 2020. Which Tasks Should Be Learned Together in Multi-task Learning? arXiv:1905.07553 [cs.CV]Google ScholarGoogle Scholar
  35. Asa Cooper Stickland and Iain Murray. 2019. BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning. arXiv:1902.02671 [cs.LG]Google ScholarGoogle Scholar
  36. Yaqing Wang, Quanming Yao, James T. Kwok, and Lionel M. Ni. 2020. Generalizing from a Few Examples: A Survey on Few-Shot Learning. ACM Comput. Surv. 53, 3, Article 63 (June 2020), 34 pages. https://doi.org/10.1145/3386252 Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2020. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, 38--45. https://www.aclweb.org/anthology/2020.emnlp-demos.6Google ScholarGoogle Scholar
  38. Jiacheng Xu and Greg Durrett. 2019. Neural Extractive Text Summarization with Syntactic Compression. arXiv:1902.00863 [cs.CL]Google ScholarGoogle Scholar
  39. Jiacheng Xu, Zhe Gan, Yu Cheng, and Jingjing Liu. 2020. Discourse-Aware Neural Extractive Text Summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 5021--5031. https://doi.org/10.18653/v1/2020.acl-main.451Google ScholarGoogle ScholarCross RefCross Ref
  40. Ziyi Yang, Chenguang Zhu, Robert Gmyr, Michael Zeng, Xuedong Huang, and Eric Darve. 2020. TED: A Pretrained Unsupervised Summarization Model with Theme Modeling and Denoising. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online, 1865--1874. https://doi.org/10.18653/v1/2020.findings-emnlp.168Google ScholarGoogle Scholar
  41. Yabin Zhang, Hui Tang, and Kui Jia. 2018. Fine-Grained Visual Categorization using Meta-Learning Optimization with Sample Selection of Auxiliary Data. In Proceedings of the European Conference on Computer Vision (ECCV).Google ScholarGoogle ScholarCross RefCross Ref
  42. Hao Zheng and Mirella Lapata. 2019. Sentence Centrality Revisited for Unsupervised Summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 6236--6247. https://doi.org/10.18653/v1/P19--1628.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Boosting Few-shot Abstractive Summarization with Auxiliary Tasks

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management
      October 2021
      4966 pages
      ISBN:9781450384469
      DOI:10.1145/3459637

      Copyright © 2021 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 30 October 2021

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper

      Acceptance Rates

      Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader