Abstract
The task of cross-language document summarization aims to produce a summary in a target language (e.g., Chinese) for a given document set in a different source language (e.g., English). Previous studies focus on ranking and selection of translated sentences in the target language. In this paper, we propose a new framework for addressing the task by extraction and ranking of multiple summaries in the target language. First, we extract multiple candidate summaries by proposing several schemes for improving the upper-bound quality of the summaries. Then, we propose a new ensemble ranking method for ranking the candidate summaries by making use of bilingual features. Extensive experiments have been conducted on a benchmark dataset and the results verify the effectiveness of our proposed framework, which outperforms a variety of baselines, including supervised baselines.
Similar content being viewed by others
References
Burges C, Shaked T, Renshaw E, Lazier A, Deeds M, Hamilton N, Hullender G (2005) Learning to rank using gradient descent. In: Proceedings of the 22nd international conference on machine learning. pp 89–96
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Cao Z, Qin T, Liu T-Y, Tsai M-F, Li H (2007) Learning to rank: from pairwise approach to listwise approach. In: Proceedings of the 24th international conference on machine learning. pp 129–136
Cao Z, Wei F, Dong L, Li S, Zhou M (2015) Ranking with recursive neural networks and its application to multi-document summarization. In: Proceedings of AAAI. pp 2153–2159
Erkan G, Radev D (2004) LexPageRank: Prestige in multi-document text summarization. In: Proceedings of EMNLP. pp 365–371
Filippova K (2010) Multi-sentence compression: finding shortest paths in word graphs. In: Proceedings of the 23rd international conference on computational linguistics (Coling 2010). pp 322–330
Freund Y, Iyer R, Schapire RE, Singer Y (2003) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4:933–969
Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976
Gillick D, Favre B, Hakkani-Tur D (2008) The ICSI summarization system at TAC 2008. In: Proceedings of the text understanding conference
Hong K, Marcus M, Nenkova A (2015) System combination for multi-document summarization. In: Proceedings of EMNLP. pp 107–117
Joachims T (2002) Optimizing search engines using clickthrough data. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining. pp 133–142
Lin CY (2004) Rouge: a package for automatic evaluation of summaries. In: Proceedings of the ACL-04 workshop on text summarization branches out
Li J, Li L, Li T (2012) Multi-document summarization via submodularity. Appl Intell 37(3):420–430
Lin H, Bilmes J (2010) Multi-document summarization via budgeted maximization of submodular functions. In: Human language technologies: the 2010 annual conference of the North American chapter of the association for computational linguistics. Association for Computational Linguistics, pp 912–920
Lin H, Bilmes J (2011) A class of submodular functions for document summarization. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies-volume 1, Association for computational linguistics. pp 510–520
Orasan C, Chiorean OA (2008) Evaluation of a cross-lingual romanian-english multi-document summariser. In: Proceedings of LREC
Ouyang Y, Li W, Li S, Lu Q (2011) Applying regression models to query-focused multi-document summarization. Inf Process Manag 47:227–237
Ouyang Y, Li S, Li W (2007) Developing learning strategies for topic-based summarization. In: Proceedings of the Sixteenth ACM conference on information and knowledge management, ACM. pp 79–86
Pingali P, Jagarlamudi J, Varma V (2007) Experiments in cross language query focused multi-document summarization. In: Workshop on cross lingual information access addressing the information need of multilingual societies in IJCAI2007
Radev D, Jing H, Styś M, Tam D (2004) Centroid-based summarization of multiple documents. Inf Process Manag 40(6):919–938
Shen D, Sun JT, Li H, Yang Q, Chen Z (2007) Document summarization using conditional random fields. In: Proceedings of IJCAI. pp 2862–2867
Wan X (2011) Using bilingual information for cross-language document summarization. In: Proceedings of ACL. pp 1546–1555
Wan X, Li H, Xiao J (2010) Cross-language document summarization based on machine translation quality prediction. In: Proceedings of ACL. pp 917–926
Wan X, Yang J, Xiao J (2007) Manifold-ranking based topic-focused multi-document summarization. In: Proceedings of IJCAI. pp 2903–2908
Wan X, Yang J (2008) Multi-document summarization using cluster-based link analysis. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval. pp 299–306
Wan X, Cao Z, Wei F, Li S, Zhou M (2015) Multi-document summarization via discriminative summary reranking. arXiv:1507.02062
Yao JG, Wan X, Xiao J (2015) Phrase-based compressive cross-language summarization. In: Proceedings of EMNLP. pp 118–127
Acknowledgements
This work was supported by National Natural Science Foundation of China (61331011, 61772036), IBM Global Faculty Award Program, and Key Laboratory of Science, Technology and Standard in Press Industry (Key Laboratory of Intelligent Press Media Technology).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wan, X., Luo, F., Sun, X. et al. Cross-language document summarization via extraction and ranking of multiple summaries. Knowl Inf Syst 58, 481–499 (2019). https://doi.org/10.1007/s10115-018-1152-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-018-1152-7