Co-clustering Sentences and Terms for Multi-document Summarization

Xia, Yunqing; Zhang, Yonggang; Yao, Jianmin

doi:10.1007/978-3-642-19437-5_28

Yunqing Xia¹⁷,
Yonggang Zhang^17,18 &
Jianmin Yao¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6609))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1326 Accesses
1 Citations
1 Altmetric

Abstract

Two issues are crucial to multi-document summarization: diversity and redundancy. Content within some topically-related articles are usually redundant while the topic is delivered from diverse perspectives. This paper presents a co-clustering based multi-document summarization method that makes full use of the diverse and redundant content. A multi-document summary is generated in three steps. First, the sentence-term co-occurrence matrix is designed to reflect diversity and redundancy. Second, the co-clustering algorithm is performed on the matrix to find globally optimal clusters for sentences and terms in an iterative manner. Third, a more accurate summary is generated by selecting representative sentences from the optimal clusters. Experiments on DUC2004 dataset show that the co-clustering based multi-document summarization method is promising.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Lin, C.-Y., Hovy, E.H.: From Single to Multi-document Summarization: A Prototype System and its Evaluation. In: ACL 2002, pp. 457–464 (2002)
Google Scholar
Harabagiu, S., Lacatusu, F.: Topic themes for multi-document summarization. In: ACM SIGIR 2005, pp. 202–209 (2005)
Google Scholar
Wan, X., Yang, J.: Multi-Document Summarization Using Cluster-Based Link Analysis. In: ACM SIGIR 2008, pp. 299–306 (2008)
Google Scholar
Wang, D., Zhu, S., Li, T., Gong, Y.: Multi-Document Summarization using Sentence-based Topic Models. In: ACL 2009 (Short Paper), pp. 297–300 (2009)
Google Scholar
Dhillon, I.S., Mallela, S., Modha, D.S.: In-formation-Theoretic Co-clustering. In: KDD 2003, pp. 89–98 (2003)
Google Scholar
Radev, D.R., Jing, H.Y., Stys, M., Tam, D.: Centroid-based summarization of multiple documents. Information Processing and Management 40, 919–938 (2004)
Article MATH Google Scholar
Mani, I., Bloedorn, E.: Summarizing Similarities and Differences among Related Documents. Information Retrieval 1(1), 35–67 (2000)
Google Scholar
Mihalcea, R.: Graph-based Ranking Algorithms for Sentence Extraction Applied to Text Summarization. In: ACL 2004 (2004)
Google Scholar
Erkan, G., Radev, D.: LexPageRank: prestige in multi-document text summarization. In: EMNLP 2004, pp. 365–371 (2004)
Google Scholar
Dai, W., Xue, G.-R., Yang, Q., Yu, Y.: Co-clustering based classification for out-of-domain documents. In: KDD 2007, pp. 210–219 (2007)
Google Scholar
Wang, P., Domeniconi, C., Hu, J.: Using Wikipedia for Co-clustering Based Cross-domain Text Classification. In: ICDM 2008, pp. 1085–1090 (2008)
Google Scholar
Frantzi, K.: Ananiadou S, A Hybrid Approach to Term Recognition. In: NLP+IA 1996(A), pp. 93–98 (1996)
Google Scholar
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)
Article MATH Google Scholar
Herings, P.J., van der Laan, G., Talman, D.: Measuring the Power of Nodes in Digraphs. Technical report, Tinbergen Institute, 01-096/1 (2001)
Google Scholar
Brin, S., Page, L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems 30(1-7) (1998)
Google Scholar
Lin, C.-Y.: ROUGE: A Package for Automatic Evaluation of Summaries. In: ACL 2004, Workshop on Text Summarization Branches Out, pp. 74–81 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
Yunqing Xia & Yonggang Zhang
School of Computer Science and Technology, Soochow University, Suzhou, 215006, China
Yonggang Zhang & Jianmin Yao

Authors

Yunqing Xia
View author publications
You can also search for this author in PubMed Google Scholar
Yonggang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jianmin Yao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xia, Y., Zhang, Y., Yao, J. (2011). Co-clustering Sentences and Terms for Multi-document Summarization. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2011. Lecture Notes in Computer Science, vol 6609. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19437-5_28

Download citation

DOI: https://doi.org/10.1007/978-3-642-19437-5_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19436-8
Online ISBN: 978-3-642-19437-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics