Cross-Language Text Summarization Using Sentence and Multi-Sentence Compression

Linhares Pontes, Elvys; Huet, Stéphane; Torres-Moreno, Juan-Manuel; Linhares, Andréa Carneiro

doi:10.1007/978-3-319-91947-8_48

Cross-Language Text Summarization Using Sentence and Multi-Sentence Compression

Elvys Linhares Pontes¹⁸,
Stéphane Huet¹⁸,
Juan-Manuel Torres-Moreno^18,19 &
…
Andréa Carneiro Linhares²⁰

Conference paper
First Online: 22 May 2018

2501 Accesses
5 Citations
1 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10859))

Abstract

Cross-Language Automatic Text Summarization produces a summary in a language different from the language of the source documents. In this paper, we propose a French-to-English cross-lingual summarization framework that analyzes the information in both languages to identify the most relevant sentences. In order to generate more informative cross-lingual summaries, we introduce the use of chunks and two compression methods at the sentence and multi-sentence levels. Experimental results on the MultiLing 2011 dataset show that our framework improves the results obtained by state-of-the art approaches according to ROUGE metrics.

This work was partially financed by the European Project CHISTERA-AMIS ANR-15-CHR2-0001.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
The GIZA++ model, https://github.com/moses-smt/giza-pp.
2.
http://www.statmt.org/europarl/.
3.
http://opus.nlpl.eu/News-Commentary.php.
4.
In this work, a unigram is represented by a chunk.
5.
The keyword bonus allows the generation of longer compressions that may be more informative and it is defined by the geometric average of all weight arcs in the Chunk Graph.
6.
Publicly available at: code.google.com/p/word2vec.
7.
http://storage.googleapis.com/sentencecomp/compression-data.json.

References

Banerjee, S., Mitra, P., Sugiyama, K.: Multi-document Abstractive Summarization Using ILP Based Multi-sentence Compression. In: 24th International Conference on Artificial Intelligence (IJCAI), IJCAI 2015, pp. 1208–1214 (2015)
Google Scholar
Boudin, F., Huet, S., Torres-Moreno, J.: A graph-based approach to cross-language multi-document summarization. Polibits 43, 113–118 (2011)
Article Google Scholar
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30(1–7), 107–117 (1998)
Article Google Scholar
Filippova, K.: Multi-sentence compression: finding shortest paths in word graphs. In: COLING, pp. 322–330 (2010)
Google Scholar
Filippova, K., Alfonseca, E., Colmenares, C.A., Kaiser, L., Vinyals, O.: Sentence compression by deletion with LSTMs. In: EMNLP, pp. 360–368 (2015)
Google Scholar
Giannakopoulos, G., El-Haj, M., Favre, B., Litvak, M., Steinberger, J., Varma, V.: TAC2011 multiling pilot overview. In: 4th Text Analysis Conference TAC (2011)
Google Scholar
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: Open source toolkit for statistical machine translation. In: 45th Annual Meeting of the Association for Computational Linguistics (ACL), Companion Volume, pp. 177–180 (2007)
Google Scholar
Kulkarni, N., Finlayson, M.A.: jMWE: a Java toolkit for detecting multi-word expressions. In: Workshop on Multiword Expressions: from Parsing and Generation to the Real World (MWE), pp. 122–124 (2011)
Google Scholar
Leuski, A., Lin, C.Y., Zhou, L., Germann, U., Och, F.J., Hovy, E.: Cross-lingual C*ST*RD: English access to Hindi Information. J. ACM Trans. Asian Lang. Inf. Process. 2(3), 245–269 (2003)
Article Google Scholar
Li, C., Liu, F., Weng, F., Liu, Y.: Document summarization via guided sentence compression. In: EMNLP, pp. 490–500. ACL (2013)
Google Scholar
Li, C., Liu, Y., Liu, F., Zhao, L., Weng, F.: Improving multi-documents summarization by sentence compression based on expanded constituent parse trees. In: EMNLP, pp. 691–701. ACL (2014)
Google Scholar
Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Workshop Text Summarization Branches Out (ACL 2004), pp. 74–81 (2004)
Google Scholar
Linhares Pontes, E., Huet, S., Gouveia da Silva, T., Linhares, A.C., Torres-Moreno, J.M.: Multi-sentence compression with word vertex-labeled graphs and integer linear programming. In: Proceedings of TextGraphs-12: the Workshop on Graph-based Methods for Natural Language Processing. Association for Computational Linguistics (2018)
Google Scholar
Linhares Pontes, E., Gouveia da Silva, T., Linhares, A.C., Torres-Moreno, J.M., Huet, S.: Métodos de otimização combinatória aplicados ao problema de compressão multifrases. In: Anais do XLVIII Simpósio Brasileiro de Pesquisa Operacional (SBPO), pp. 2278–2289 (2016)
Google Scholar
Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: 52nd Annual Meeting of the Association for Computational Linguistics (ACL): System Demonstrations, pp. 55–60 (2014)
Google Scholar
Niu, J., Chen, H., Zhao, Q., Su, L., Atiquzzaman, M.: Multi-document abstractive summarization using chunk-graph and recurrent neural network. In: IEEE International Conference on Communications, ICC, pp. 1–6 (2017)
Google Scholar
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)
Article Google Scholar
Orasan, C., Chiorean, O.A.: Evaluation of a cross-lingual Romanian-English multi-document summariser. In: 6th International Conference on Language Resources and Evaluation (LREC) (2008)
Google Scholar
Qian, X., Liu, Y.: Fast joint compression and summarization via graph Cuts. In: EMNLP, pp. 1492–1502 (2013)
Google Scholar
Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword expressions: a pain in the neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45715-1_1
Chapter Google Scholar
Torres-Moreno, J.M.: Automatic Text Summarization. Wiley and Sons, London (2014)
Book Google Scholar
Wan, X.: Using bilingual information for cross-language document summarization. In: ACL, pp. 1546–1555 (2011)
Google Scholar
Wan, X., Li, H., Xiao, J.: Cross-language document summarization based on machine translation quality prediction. In: ACL, pp. 917–926 (2010)
Google Scholar
Yao, J., Wan, X., Xiao, J.: Compressive document summarization via sparse optimization. In: IJCAI, pp. 1376–1382. AAAI Press (2015)
Google Scholar
Yao, J., Wan, X., Xiao, J.: Phrase-based compressive cross-language summarization. In: EMNLP, pp. 118–127 (2015)
Google Scholar
Zhang, J., Zhou, Y., Zong, C.: Abstractive cross-language summarization via translation model enhanced predicate argument structure fusing. IEEE/ACM Trans. Audio Speech Lang. Process. 24(10), 1842–1853 (2016)
Article Google Scholar

Download references

Author information

Authors and Affiliations

LIA, Université d’Avignon et des Pays de Vaucluse, 84000, Avignon, France
Elvys Linhares Pontes, Stéphane Huet & Juan-Manuel Torres-Moreno
École Polytechnique de Montréal, Montreal, Québec, Canada
Juan-Manuel Torres-Moreno
Universidade Federal do Ceará, Sobral, Ceará, Brazil
Andréa Carneiro Linhares

Authors

Elvys Linhares Pontes
View author publications
You can also search for this author in PubMed Google Scholar
Stéphane Huet
View author publications
You can also search for this author in PubMed Google Scholar
Juan-Manuel Torres-Moreno
View author publications
You can also search for this author in PubMed Google Scholar
Andréa Carneiro Linhares
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elvys Linhares Pontes .

Editor information

Editors and Affiliations

Université de Franche-Comté, Besançon, France
Max Silberztein
Conservatoire National des Arts et Métiers, Paris, France
Faten Atigui
Conservatoire National des Arts et Métiers, Paris, France
Elena Kornyshova
Conservatoire National des Arts et Métiers, Paris, France
Elisabeth Métais
University of Salford, Manchester, United Kingdom
Farid Meziane

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Linhares Pontes, E., Huet, S., Torres-Moreno, JM., Linhares, A.C. (2018). Cross-Language Text Summarization Using Sentence and Multi-Sentence Compression. In: Silberztein, M., Atigui, F., Kornyshova, E., Métais, E., Meziane, F. (eds) Natural Language Processing and Information Systems. NLDB 2018. Lecture Notes in Computer Science(), vol 10859. Springer, Cham. https://doi.org/10.1007/978-3-319-91947-8_48

Download citation

DOI: https://doi.org/10.1007/978-3-319-91947-8_48
Published: 22 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91946-1
Online ISBN: 978-3-319-91947-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics