short-paper

Boosting Few-shot Abstractive Summarization with Auxiliary Tasks

Authors:
Qiwei Bi

Renmin University of China, Beijing, China

Renmin University of China, Beijing, China
View Profile

,
Haoyuan Li

Harvard University, Boston, MA, USA

Harvard University, Boston, MA, USA
View Profile

,
Hanfang Yang

Renmin University of China, Beijing, China

Renmin University of China, Beijing, China
View Profile

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge ManagementOctober 2021Pages 2888–2893https://doi.org/10.1145/3459637.3482066

Published:30 October 2021Publication History

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Pages 2888–2893

ABSTRACT

For summarization in niche domains, data is not enough to fine-tune the large pre-trained model. In order to alleviate the few-shot problem, we design several auxiliary tasks to assist the main task---abstractive summarization. In this paper, we employ BART as the base sequence-to-sequence model and incorporate the main and auxiliary tasks under the multi-task framework. We transform all the tasks in the format of machine reading comprehension [19]. Moreover, we utilize the task-specific adapter to effectively share knowledge across tasks and the adaptive weight mechanism to adjust the contribution of auxiliary tasks to the main task. Experiments show the effectiveness of our method for few-shot datasets. We also propose to firstly pre-train the model on unlabeled datasets, and the methods proposed in this paper can further improve the model performance.

Supplemental Material

cikm21-rgsp0441.mp4

mp4

10.5 MB

Download

References

Gabor Angeli, Melvin Jose Johnson Premkumar, and Christopher D. Manning. 2015. Leveraging Linguistic Structure For Open Domain Information Extraction. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Beijing, China, 344--354. https://doi.org/10.3115/v1/P15--1034Google Scholar
Christos Baziotis, Ion Androutsopoulos, Ioannis Konstas, and Alexandros Potami- anos. 2019. SEQ 3: Differentiable Sequence-to-Sequence-to-Sequence Autoen-coder for Unsupervised Abstractive Sentence Compression. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 673--681. https://doi.org/10.18653/v1/N19-1071Google Scholar
Sagie Benaim and Lior Wolf. 2018. One-Shot Unsupervised Cross Domain Translation. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Vol. 31. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2018/file/062ddb6c727310e76b6200b7c71f63b5-Paper.pdf Google ScholarDigital Library
Arthur Bra?inskas, Mirella Lapata, and Ivan Titov. 2020. Unsupervised Opinion Summarization as Copycat-Review Generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 5151--5169. https://doi.org/10.18653/v1/2020.acl- main.461Google Scholar
Y. Chen, Y. Ma, Xudong Mao, and Q. Li. 2019. Multi-Task Learning for Abstractive and Extractive Summarization. Data Science and Engineering 4 (2019), 14--23.Google Scholar
Eric Chu and Peter J. Liu. 2018. MeanSum: A Neural Model for Unsupervised Multi-document Abstractive Summarization. arXiv e-prints, Article arXiv:1810.05739 (Oct. 2018), arXiv:1810.05739 pages. arXiv:1810.05739 [cs.CL]Google Scholar
Rotem Dror, Gili Baumer, Segev Shlomov, and Roi Reichart. 2018. The Hitchhiker's Guide to Testing Statistical Significance in Natural Language Processing. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, 1383--1392. https://doi.org/10.18653/v1/P18-1128Google ScholarCross Ref
Alexander R. Fabbri, Simeng Han, Haoyuan Li, Haoran Li, Marjan Ghazvininejad, Shafiq Joty, Dragomir Radev, and Yashar Mehdad. 2020. Improving Zero and Few-Shot Abstractive Summarization with Intermediate Fine-tuning and Data Augmentation. arXiv:2010.12836 [cs.CL]Google Scholar
Alexander R. Fabbri, Wojciech Kryścińki, Bryan McCann, Caiming Xiong, Richard Socher, and Dragomir Radev. 2021. SummEval: Re-evaluating Summarization Evaluation. Transactions of the Association for Computational Linguistics 9 (04 2021), 391--409. https://doi.org/10.1162/tacl_a_00373 arXiv:https://direct.mit.edu/tacl/article-pdf/doi/10.1162/tacl_a_00373/1923949/tacl_a_00373.pdfGoogle Scholar
Travis Goodwin, Max Savery, and Dina Demner-Fushman. 2020. Towards Zero-Shot Conditional Summarization with Adaptive Multi-Task Fine-Tuning. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online, 3215--3226. https://www.aclweb.org/anthology/2020.findings-emnlp.289Google Scholar
Max Grusky, Mor Naaman, and Yoav Artzi. 2018. Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, New Orleans, Louisiana, 708--719. https://doi.org/10.18653/v1/N18-1065Google ScholarCross Ref
Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin de Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. Parameter-Efficient Transfer Learning for NLP. arXiv:1902.00751 [cs.LG]Google Scholar
Zikun Hu, Xiang Li, Cunchao Tu, Zhiyuan Liu, and Maosong Sun. 2018. Few-Shot Charge Prediction with Discriminative Legal Attributes. In Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics, Santa Fe, New Mexico, USA, 487--498. https://www.aclweb.org/anthology/C18--1041Google Scholar
Luyang Huang, Lingfei Wu, and Lu Wang. 2020. Knowledge Graph-Augmented Abstractive Summarization with Semantic-Driven Cloze Reward. In Proceed- ings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 5094--5107. https://doi.org/10.18653/v1/2020.acl-main.457Google Scholar
Yuge Huang, Yuhan Wang, Ying Tai, Xiaoming Liu, Pengcheng Shen, Shaoxin Li, Jilin Li, and Feiyue Huang. 2020. CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition. arXiv:2004.00288 [cs.CV]Google Scholar
Masaru Isonuma, Toru Fujino, Junichiro Mori, Yutaka Matsuo, and Ichiro Sakata. 2017. Extractive Summarization Using Multi-Task Learning with Document Classification. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Copenhagen, Denmark, 2101--2110. https://doi.org/10.18653/v1/D17--1223Google ScholarCross Ref
Wojciech Kryscinski, B. McCann, Caiming Xiong, and R. Socher. 2020. Evaluating the Factual Consistency of Abstractive Text Summarization. ArXiv abs/1910.12840 (2020).Google Scholar
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 7871--7880. https://doi.org/10.18653/v1/2020.acl-main.703Google ScholarCross Ref
Kaixuan Li, Xiujuan Xian, Jiafu Wang, and Niannian Yu. 2019. First-principle study on honeycomb fluorated-InTe monolayer with large Rashba spin splitting and direct bandgap. Applied Surface Science 471 (Mar 2019), 18--22. https://doi.org/10.1016/j.apsusc.2018.11.214Google Scholar
Xingyu Lin, Harjatin Baweja, George Kantor, and David Held. 2019. Adaptive Auxiliary Task Weighting for Reinforcement Learning. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc., 4772--4783. https://proceedings.neurips.cc/paper/2019/file/0e900ad84f63618452210ab8baae0218-Paper.pdf Google ScholarDigital Library
Yang Liu and Mirella Lapata. 2019. Text Summarization with Pretrained Encoders. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3730--3740. https://doi.org/10.18653/v1/D19-1387Google Scholar
Christopher Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, Baltimore, Maryland, 55--60. https://doi.org/10.3115/v1/P14-5010Google ScholarCross Ref
Bryan McCann, Nitish Shirish Keskar, Caiming Xiong, and Richard Socher. 2018. The Natural Language Decathlon: Multitask Learning as Question Answering. CoRR abs/1806.08730 (2018). arXiv:1806.08730 http://arxiv.org/abs/1806.08730Google Scholar
Anshuman Mishra, Dhruvesh Patel, Aparna Vijayakumar, Xiang Li, Pavan Ka- panipathi, and Kartik Talamadupula. 2020. Reading Comprehension as Natural Language Inference: A Semantic Analysis. arXiv:2010.01713 [cs.CL]Google Scholar
Saeid Motiian, Quinn Jones, Seyed Iranmanesh, and Gianfranco Doretto. 2017. Few-Shot Adversarial Domain Adaptation. In Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc., 6670--6680. https://proceedings.neurips.cc/paper/2017/file/21c5bba1dd6aed9ab48c2b34c1a0adde-Paper.pdf Google ScholarDigital Library
Ramesh Nallapati, Feifei Zhai, and Bowen Zhou. 2017. SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (San Francisco, California, USA) (AAAI'17). AAAI Press, 3075--3081. Google ScholarDigital Library
Ramesh Nallapati, Bowen Zhou, Cicero dos Santos, Çağlar Gulçehre, and Bing Xiang. 2016. Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond. In Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning. Association for Computational Linguistics, Berlin, Germany, 280--290. https://doi.org/10.18653/v1/K16--1028Google ScholarCross Ref
Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli. 2019. fairseq: A Fast, Extensible Toolkit for Sequence Modeling. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations). Association for Computational Linguistics, Minneapolis, Minnesota, 48--53. https://doi.org/10.18653/v1/N19-4009Google ScholarCross Ref
Gabriel Pereyra, George Tucker, Jan Chorowski, Łukasz Kaiser, and Geoffrey Hinton. 2017. Regularizing Neural Networks by Penalizing Confident Output Distributions. arXiv:1701.06548 [cs.NE]Google Scholar
Jonas Pfeiffer, Aishwarya Kamath, Andreas Rücklé, Kyunghyun Cho, and Iryna Gurevych. 2020. AdapterFusion: Non-Destructive Task Composition for Transfer Learning. arXiv:2005.00247 [cs.CL]Google Scholar
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research 21, 140 (2020), 1--67. http://jmlr.org/papers/v21/20-074.htmlGoogle Scholar
Sebastian Ruder. 2017. An Overview of Multi-Task Learning in Deep Neural Networks. arXiv:1706.05098 [cs.LG]Google Scholar
Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get To The Point: Summarization with Pointer-Generator Networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada, 1073--1083. https://doi.org/10.18653/v1/P17-1099Google Scholar
Trevor Standley, Amir R. Zamir, Dawn Chen, Leonidas Guibas, Jitendra Malik, and Silvio Savarese. 2020. Which Tasks Should Be Learned Together in Multi-task Learning? arXiv:1905.07553 [cs.CV]Google Scholar
Asa Cooper Stickland and Iain Murray. 2019. BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning. arXiv:1902.02671 [cs.LG]Google Scholar
Yaqing Wang, Quanming Yao, James T. Kwok, and Lionel M. Ni. 2020. Generalizing from a Few Examples: A Survey on Few-Shot Learning. ACM Comput. Surv. 53, 3, Article 63 (June 2020), 34 pages. https://doi.org/10.1145/3386252 Google ScholarDigital Library
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2020. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, 38--45. https://www.aclweb.org/anthology/2020.emnlp-demos.6Google Scholar
Jiacheng Xu and Greg Durrett. 2019. Neural Extractive Text Summarization with Syntactic Compression. arXiv:1902.00863 [cs.CL]Google Scholar
Jiacheng Xu, Zhe Gan, Yu Cheng, and Jingjing Liu. 2020. Discourse-Aware Neural Extractive Text Summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 5021--5031. https://doi.org/10.18653/v1/2020.acl-main.451Google ScholarCross Ref
Ziyi Yang, Chenguang Zhu, Robert Gmyr, Michael Zeng, Xuedong Huang, and Eric Darve. 2020. TED: A Pretrained Unsupervised Summarization Model with Theme Modeling and Denoising. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online, 1865--1874. https://doi.org/10.18653/v1/2020.findings-emnlp.168Google Scholar
Yabin Zhang, Hui Tang, and Kui Jia. 2018. Fine-Grained Visual Categorization using Meta-Learning Optimization with Sample Selection of Auxiliary Data. In Proceedings of the European Conference on Computer Vision (ECCV).Google ScholarCross Ref
Hao Zheng and Mirella Lapata. 2019. Sentence Centrality Revisited for Unsupervised Summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 6236--6247. https://doi.org/10.18653/v1/P19--1628.Google ScholarCross Ref

Index Terms

Boosting Few-shot Abstractive Summarization with Auxiliary Tasks
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Natural language generation

Recommendations

Learning multi-tasks with inconsistent labels by using auxiliary big task
Abstract
Multi-task learning is to improve the performance of the model by transferring and exploiting common knowledge among tasks. Existing MTL works mainly focus on the scenario where label sets among multiple tasks (MTs) are usually the same, thus they ...
Read More
Few-shot image classification with composite rotation based self-supervised auxiliary task
Highlights
- Proposes composite rotation that is composed of inner and outer rotations of image.
Abstract
Many real-life problem settings have classes of data with very few examples for training. Deep learning networks do not perform well for such few-shot classes. In order to perform well in this setting, the networks should learn to ...
Read More
Graph-based abstractive biomedical text summarization
Graphical abstract

Display Omitted
Highlights
- A graph generation and frequent itemset mining approach have been used for the generation of extractive summaries.
Abstract
Summarization is the process of compressing a text to obtain its important informative parts. In recent years, various methods have been presented to extract important parts of textual documents to present them in a summarized form. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management
October 2021
4966 pages
ISBN:9781450384469
DOI:10.1145/3459637
General Chairs:
Gianluca Demartini
The University of Queensland, Australia
,
Guido Zuccon
The University of Queensland, Australia
,
Program Chairs:
J. Shane Culpepper
RMIT University, Australia
,
Zi Huang
The University of Queensland, Australia
,
Hanghang Tong
University of Illinois at Urbana-Champaign, USA
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 October 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
abstractive summarization
adapter
auxiliary task
few-shot
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 276
  Total Downloads
- Downloads (Last 12 months)36
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Boosting Few-shot Abstractive Summarization with Auxiliary Tasks

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Learning multi-tasks with inconsistent labels by using auxiliary big task

Few-shot image classification with composite rotation based self-supervised auxiliary task

Graph-based abstractive biomedical text summarization