Abstract
Recently, neural models have been proposed for headline generation by learning to map documents to headlines with recurrent neural network. In this work, we give a detailed introduction and comparison of existing work and recent improvements in neural headline generation, with particular attention on how encoders, decoders and neural model training strategies alter the overall performance of the headline generation system. Furthermore, we perform quantitative analysis of most existing neural headline generation systems and summarize several key factors that impact the performance of headline generation systems. Meanwhile, we carry on detailed error analysis to typical neural headline generation systems in order to gain more comprehension. Our results and conclusions are hoped to benefit future research studies.
Similar content being viewed by others
References
Dorr B, Zajic D, Schwartz R. Hedge trimmer: A parse-andtrim approach to headline generation. In Proc. the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics on Text summarization workshop, Volume 5, May 2003, pp.1-8.
Chopra S, Auli M, Rush A M. Abstractive sentence summarization with attentive recurrent neural networks. In Proc. the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June 2016, pp.93-98.
Nallapati R, Zhou B, Santos C. Abstractive text summarization using sequence-to-sequence RNNS and beyond. http://aclweb.org/anthology/K/K16/K16-1028.pdf, May 2017.
Takase S, Suzuki J, Okazaki N, Hirao T, Nagata M. Neural headline generation on abstract meaning representation. In Proc. the Conference on Empirical Methods in Natural Language Processing, November 2016, pp.1054-1059.
Hu B, Chen Q, Zhu F. LCSTS: A large scale Chinese short text summarization dataset. In Proc. the Conference on Empirical Methods in Natural Language Processing, September 2015, pp.1967-1972.
Gu J, Lu Z, Li H, Li V O. Incorporating copying mechanism in sequence-to-sequence learning. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, August 2016, pp.1631-1640.
Rush A M, Chopra S, Weston J. A neural attention model for abstractive sentence summarization. In Proc. the Conference on Empirical Methods in Natural Language Processing, September 2015, pp.379-389.
Bengio Y, Simard P, Frasconi P. Learning long-term dependencies with gradient descent is diffcult. IEEE Transactions on Neural Networks, 1994, 5(2): 157-166.
Cho K, van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proc. the Conference on Empirical Methods in Natural Language Processing, October 2014, pp.1724-1734.
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735-1780.
Kikuchi Y, Neubig G, Sasano R, Takamura H, Okumura M. Controlling output length in neural encoder-decoders. In Proc. the Conference on Empirical Methods in Natural Language Processing, November 2016, pp.1328-1338.
Miao Y, Blunsom P. Language as a latent variable: Discrete generative models for sentence compression. In Proc. the Conference on Empirical Methods in Natural Language Processing, November 2016, pp.319-328.
Schuster M, Paliwal K K. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 1997, 45(11): 2673-2681.
Bengio Y, Ducharme R, Vincent P, Jauvin C. A neural probabilistic language model. The Journal of Machine Learning Research, 2003, 3: 1137-1155.
Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. In Proc. ICLR, May 2015.
Shen S, Cheng Y, He Z, He W, Wu H, Sun M, Liu Y. Minimum risk training for neural machine translation. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, August 2016, pp.1683-1692.
Ranzato M, Chopra S, Auli M, Zaremba W. Sequence level training with recurrent neural networks. In Proc. ICLR, May 2016.
Och F J. Minimum error rate training in statistical machine translation. In Proc. the 41st Annual Meeting on Association for Computational Linguistics, July 2003, pp.160-167.
Smith D A, Eisner J. Minimum risk annealing for training log-linear models. In Proc. the COLING/ACL Main Conference Poster Sessions, July 2006, pp.787-794.
Gao J, He X, Yih W, Deng L. Learning continuous phrase representations for translation modeling. In Proc. the 52nd Annual Meeting of the Association for Computational Linguistics, June 2014.
Lin C Y. ROUGE: A package for automatic evaluation of summaries. In Proc. the Workshop on Text Summarization Branches Out, July 2004.
Gulcehre C, Ahn S, Nallapati R, Zhou B, Bengio Y. Pointing the unknown words. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, August 2016, pp.140-149.
Vinyals O, Fortunato M, Jaitly N. Pointer networks. In Proc. Advances in Neural Information Processing Systems, Dec. 2015, pp.2692-2700.
Jean S, Cho K, Memisevic R, Bengio Y. On using very large target vocabulary for neural machine translation. In Proc. the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, July 2015, pp.1-10.
Vogel S, Ney H, Tillmann C. HMM-based word alignment in statistical translation. In Proc. the 16th Conference on Computational Linguistics, Aug. 1996, pp.836-841.
Tillmann C, Vogel S, Ney H, Zubiaga A. A DP-based search using monotone alignments in statistical translation. In Proc. the 35th Annual Meeting of the Association for Computational Linguistics, July 1997, pp.289-296.
Yu L, Buys J, Blunsom P. Online segment to segment neural transduction. In Proc. the Conference on Empirical Methods in Natural Language Processing, November 2016, pp.1307-1316.
Banko M, Mittal V O, Witbrock M J. Headline generation based on statistical translation. In Proc. the 38th Annual Meeting of the Association for Computational Linguistics, Oct. 2000, pp.318-325.
Napoles C, Gormley M, van Durme B. Annotated Gigaword. In Proc. the Joint Workshop on Automatic Knowledge Base Construction and Web-Scale Knowledge Extraction, June 2012, pp.95-100.
Zeiler M D. ADADELTA: An adaptive learning rate method. arXiv:1212.5701, 2012. https://arxiv.org/abs/12-12.5701, May 2017.
Luong T, Pham H, Manning C D. Effective approaches to attention-based neural machine translation. In Proc. the Conference on Empirical Methods in Natural Language Processing, September 2015, pp.1412-1421.
Zajic D, Dorr B, Schwartz R. BBN/UMD at DUC-2004: Topiary. In Proc. the HLT-NAACL Document Understanding Workshop, Jan. 2004, pp.112-119.
Cheng J, Lapata M. Neural summarization by extracting sentences and words. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, Aug. 2016.
Cao Z, Li W, Li S, Wei F, Li Y. AttSum: Joint learning of focusing and summarization with neural attention. In Proc. the 26th International Conference on Computational Linguistics, December 2016, pp.547-556.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ayana, Shen, SQ., Lin, YK. et al. Recent Advances on Neural Headline Generation. J. Comput. Sci. Technol. 32, 768–784 (2017). https://doi.org/10.1007/s11390-017-1758-3
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-017-1758-3