Cross-language document summarization via extraction and ranking of multiple summaries

Wan, Xiaojun; Luo, Fuli; Sun, Xue; Huang, Songfang; Yao, Jin-ge

doi:10.1007/s10115-018-1152-7

Cross-language document summarization via extraction and ranking of multiple summaries

Short Paper
Published: 17 January 2018

Volume 58, pages 481–499, (2019)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Xiaojun Wan ORCID: orcid.org/0000-0001-6887-1994^1,2,
Fuli Luo²,
Xue Sun¹,
Songfang Huang³ &
…
Jin-ge Yao¹

1002 Accesses
15 Citations
Explore all metrics

Abstract

The task of cross-language document summarization aims to produce a summary in a target language (e.g., Chinese) for a given document set in a different source language (e.g., English). Previous studies focus on ranking and selection of translated sentences in the target language. In this paper, we propose a new framework for addressing the task by extraction and ranking of multiple summaries in the target language. First, we extract multiple candidate summaries by proposing several schemes for improving the upper-bound quality of the summaries. Then, we propose a new ensemble ranking method for ranking the candidate summaries by making use of bilingual features. Extensive experiments have been conducted on a benchmark dataset and the results verify the effectiveness of our proposed framework, which outperforms a variety of baselines, including supervised baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Natural language processing: state of the art, current trends and challenges

Article 14 July 2022

Diksha Khurana, Aditya Koli, … Sukhdev Singh

Automated identification of media bias in news articles: an interdisciplinary literature review

Article Open access 16 November 2018

Felix Hamborg, Karsten Donnay & Bela Gipp

Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis

Article 26 October 2022

Belal Abdullah Hezam Murshed, Suresha Mallappa, … Hudhaifa Mohammed Abdulwahab

Notes

References

Burges C, Shaked T, Renshaw E, Lazier A, Deeds M, Hamilton N, Hullender G (2005) Learning to rank using gradient descent. In: Proceedings of the 22nd international conference on machine learning. pp 89–96
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article MATH Google Scholar
Cao Z, Qin T, Liu T-Y, Tsai M-F, Li H (2007) Learning to rank: from pairwise approach to listwise approach. In: Proceedings of the 24th international conference on machine learning. pp 129–136
Cao Z, Wei F, Dong L, Li S, Zhou M (2015) Ranking with recursive neural networks and its application to multi-document summarization. In: Proceedings of AAAI. pp 2153–2159
Erkan G, Radev D (2004) LexPageRank: Prestige in multi-document text summarization. In: Proceedings of EMNLP. pp 365–371
Filippova K (2010) Multi-sentence compression: finding shortest paths in word graphs. In: Proceedings of the 23rd international conference on computational linguistics (Coling 2010). pp 322–330
Freund Y, Iyer R, Schapire RE, Singer Y (2003) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4:933–969
MathSciNet MATH Google Scholar
Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976
Article MathSciNet MATH Google Scholar
Gillick D, Favre B, Hakkani-Tur D (2008) The ICSI summarization system at TAC 2008. In: Proceedings of the text understanding conference
Hong K, Marcus M, Nenkova A (2015) System combination for multi-document summarization. In: Proceedings of EMNLP. pp 107–117
Joachims T (2002) Optimizing search engines using clickthrough data. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining. pp 133–142
Lin CY (2004) Rouge: a package for automatic evaluation of summaries. In: Proceedings of the ACL-04 workshop on text summarization branches out
Li J, Li L, Li T (2012) Multi-document summarization via submodularity. Appl Intell 37(3):420–430
Article Google Scholar
Lin H, Bilmes J (2010) Multi-document summarization via budgeted maximization of submodular functions. In: Human language technologies: the 2010 annual conference of the North American chapter of the association for computational linguistics. Association for Computational Linguistics, pp 912–920
Lin H, Bilmes J (2011) A class of submodular functions for document summarization. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies-volume 1, Association for computational linguistics. pp 510–520
Orasan C, Chiorean OA (2008) Evaluation of a cross-lingual romanian-english multi-document summariser. In: Proceedings of LREC
Ouyang Y, Li W, Li S, Lu Q (2011) Applying regression models to query-focused multi-document summarization. Inf Process Manag 47:227–237
Article Google Scholar
Ouyang Y, Li S, Li W (2007) Developing learning strategies for topic-based summarization. In: Proceedings of the Sixteenth ACM conference on information and knowledge management, ACM. pp 79–86
Pingali P, Jagarlamudi J, Varma V (2007) Experiments in cross language query focused multi-document summarization. In: Workshop on cross lingual information access addressing the information need of multilingual societies in IJCAI2007
Radev D, Jing H, Styś M, Tam D (2004) Centroid-based summarization of multiple documents. Inf Process Manag 40(6):919–938
Article MATH Google Scholar
Shen D, Sun JT, Li H, Yang Q, Chen Z (2007) Document summarization using conditional random fields. In: Proceedings of IJCAI. pp 2862–2867
Wan X (2011) Using bilingual information for cross-language document summarization. In: Proceedings of ACL. pp 1546–1555
Wan X, Li H, Xiao J (2010) Cross-language document summarization based on machine translation quality prediction. In: Proceedings of ACL. pp 917–926
Wan X, Yang J, Xiao J (2007) Manifold-ranking based topic-focused multi-document summarization. In: Proceedings of IJCAI. pp 2903–2908
Wan X, Yang J (2008) Multi-document summarization using cluster-based link analysis. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval. pp 299–306
Wan X, Cao Z, Wei F, Li S, Zhou M (2015) Multi-document summarization via discriminative summary reranking. arXiv:1507.02062
Yao JG, Wan X, Xiao J (2015) Phrase-based compressive cross-language summarization. In: Proceedings of EMNLP. pp 118–127

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (61331011, 61772036), IBM Global Faculty Award Program, and Key Laboratory of Science, Technology and Standard in Press Industry (Key Laboratory of Intelligent Press Media Technology).

Author information

Authors and Affiliations

Institute of Computer Science and Technology, Peking University, Beijing, China
Xiaojun Wan, Xue Sun & Jin-ge Yao
Key Laboratory of Computational Linguistics (Peking University), MOE, Beijing, China
Xiaojun Wan & Fuli Luo
IBM China Research Laboratory, Beijing, China
Songfang Huang

Authors

Xiaojun Wan
View author publications
You can also search for this author in PubMed Google Scholar
Fuli Luo
View author publications
You can also search for this author in PubMed Google Scholar
Xue Sun
View author publications
You can also search for this author in PubMed Google Scholar
Songfang Huang
View author publications
You can also search for this author in PubMed Google Scholar
Jin-ge Yao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaojun Wan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wan, X., Luo, F., Sun, X. et al. Cross-language document summarization via extraction and ranking of multiple summaries. Knowl Inf Syst 58, 481–499 (2019). https://doi.org/10.1007/s10115-018-1152-7

Download citation

Received: 27 September 2016
Revised: 09 August 2017
Accepted: 04 January 2018
Published: 17 January 2018
Issue Date: 06 February 2019
DOI: https://doi.org/10.1007/s10115-018-1152-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cross-language document summarization via extraction and ranking of multiple summaries

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Automated identification of media bias in news articles: an interdisciplinary literature review

Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cross-language document summarization via extraction and ranking of multiple summaries

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Automated identification of media bias in news articles: an interdisciplinary literature review

Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation