Abstract
Two issues are crucial to multi-document summarization: diversity and redundancy. Content within some topically-related articles are usually redundant while the topic is delivered from diverse perspectives. This paper presents a co-clustering based multi-document summarization method that makes full use of the diverse and redundant content. A multi-document summary is generated in three steps. First, the sentence-term co-occurrence matrix is designed to reflect diversity and redundancy. Second, the co-clustering algorithm is performed on the matrix to find globally optimal clusters for sentences and terms in an iterative manner. Third, a more accurate summary is generated by selecting representative sentences from the optimal clusters. Experiments on DUC2004 dataset show that the co-clustering based multi-document summarization method is promising.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Lin, C.-Y., Hovy, E.H.: From Single to Multi-document Summarization: A Prototype System and its Evaluation. In: ACL 2002, pp. 457–464 (2002)
Harabagiu, S., Lacatusu, F.: Topic themes for multi-document summarization. In: ACM SIGIR 2005, pp. 202–209 (2005)
Wan, X., Yang, J.: Multi-Document Summarization Using Cluster-Based Link Analysis. In: ACM SIGIR 2008, pp. 299–306 (2008)
Wang, D., Zhu, S., Li, T., Gong, Y.: Multi-Document Summarization using Sentence-based Topic Models. In: ACL 2009 (Short Paper), pp. 297–300 (2009)
Dhillon, I.S., Mallela, S., Modha, D.S.: In-formation-Theoretic Co-clustering. In: KDD 2003, pp. 89–98 (2003)
Radev, D.R., Jing, H.Y., Stys, M., Tam, D.: Centroid-based summarization of multiple documents. Information Processing and Management 40, 919–938 (2004)
Mani, I., Bloedorn, E.: Summarizing Similarities and Differences among Related Documents. Information Retrieval 1(1), 35–67 (2000)
Mihalcea, R.: Graph-based Ranking Algorithms for Sentence Extraction Applied to Text Summarization. In: ACL 2004 (2004)
Erkan, G., Radev, D.: LexPageRank: prestige in multi-document text summarization. In: EMNLP 2004, pp. 365–371 (2004)
Dai, W., Xue, G.-R., Yang, Q., Yu, Y.: Co-clustering based classification for out-of-domain documents. In: KDD 2007, pp. 210–219 (2007)
Wang, P., Domeniconi, C., Hu, J.: Using Wikipedia for Co-clustering Based Cross-domain Text Classification. In: ICDM 2008, pp. 1085–1090 (2008)
Frantzi, K.: Ananiadou S, A Hybrid Approach to Term Recognition. In: NLP+IA 1996(A), pp. 93–98 (1996)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)
Herings, P.J., van der Laan, G., Talman, D.: Measuring the Power of Nodes in Digraphs. Technical report, Tinbergen Institute, 01-096/1 (2001)
Brin, S., Page, L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems 30(1-7) (1998)
Lin, C.-Y.: ROUGE: A Package for Automatic Evaluation of Summaries. In: ACL 2004, Workshop on Text Summarization Branches Out, pp. 74–81 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xia, Y., Zhang, Y., Yao, J. (2011). Co-clustering Sentences and Terms for Multi-document Summarization. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2011. Lecture Notes in Computer Science, vol 6609. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19437-5_28
Download citation
DOI: https://doi.org/10.1007/978-3-642-19437-5_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19436-8
Online ISBN: 978-3-642-19437-5
eBook Packages: Computer ScienceComputer Science (R0)