ABSTRACT
This paper presents the integration of cohesive properties of text with coherence relations, to obtain an adequate representation of text for automatic summarization. A summarizer based on Lexical Chains is enchanced with rhetorical and argumentative structure obtained via Discourse Markers.When evaluated with newspaper corpus, this integration yields only slight improvement in the resulting summaries and cannot beat a dummy baseline consisting of the first sentence in the document. Nevertheless, we argue that this approach relies on basic linguistic mechanisms and is therefore genre-independent.
- Laura Alonso and Irene Castellón. 2001. Towards a delimitation of discursive segment for natural language processing applications. In First International Workshop on Semantics, Pragmatics and Rhetoric, Donostia - San Sebastiàn, November.Google Scholar
- Laura Alonso and Maria Fuentes. 2002. Collaborating discourse for text summarisation. In Proceedings of the Seventh ESSLLI Student Session.Google Scholar
- Laura Alonso, Irene Castellón, and Lluís Padró. 2002a. Design and implementation of a spanish discourse marker lexicon. In SEPLN, Valladolid.Google Scholar
- Laura Alonso, Irene Castellón, and Lluís Padró. 2002b. X-tractor: A tool for extracting discourse markers. In LREC 2002 workshop on Linguistic Knowledge Acquisition and Representation: Bootstrapping Annotated Language Data, Las Palmas.Google Scholar
- J. C. Anscombre and O. Ducrot. 1983. L'argumentation dans la langue. Mardaga.Google Scholar
- Montse Arévalo, Xavi Carreras, Lluís Màrquez, M. Antònia Martí, Lluís Padró, and M. José Simón. 2002. A proposal for wide-coverage spanish named entity recognition. Procesamiento del Lenguaje Natural, 1(3).Google Scholar
- Nicholas Asher and Alex Lascarides. 2002. The Logic of Conversation. Cambridge University Press.Google Scholar
- Regina Barzilay. 1997. Lexical Chains for Summarization. Ph.D. thesis, Ben-Gurion University of the Negev.Google Scholar
- Meru Brunn, Yllias Chali, and Christopher J. Pinchak. 2001. Text Summarization using lexical chains. In Workshop on Text Summarization in conjunction with the ACM SIGIR Conference 2001, New Orleans, Louisiana.Google Scholar
- Josep Carmona, Sergi Cervell, Lluís Màrquez, M. Antònia Mart, Lluís Padró, Roberto Placer, Horacio Rodríguez, Mariona Taulé, and Jordi Turmo. 1998. An environment for morphosyntactic processing of unrestricted spanish text. In First International Conference on Language Resources and Evaluation (LREC'98), Granada, Spain.Google Scholar
- Simon H. Corston-Oliver and W. Dolan. 1999. Less is more: Eliminating index terms from subordinate clauses. In 37th Annual Meeting of the Association for Computational Linguistics (ACL'99), pages 348 -- 356. Google ScholarDigital Library
- DUC. 2002. DUC-document understanding conference. http://duc.nist.gov/.Google Scholar
- K. Forbes, E. Miltsakaki, R. Prasad, A. Sarkar, A. Joshi, and B. Webber. 2003. D-LTAG system - discourse parsing with a lexicalized tree-adjoining grammar. Journal of Language, Logic and Information. to appear. Google ScholarDigital Library
- Maria Fuentes and Horacio Rodríguez. 2002. Using cohesive properties of text for automatic summarization. In JOTRI'02.Google Scholar
- Jade Goldstein, Vibhu Mittal, Mark Kantrowitz, and Jaime Carbonell. 1999. Summarizing text documents: Sentence selection and evaluation metrics. In SIGIR-99. Google ScholarDigital Library
- M. A. K. Halliday and R. Hasan. 1976. Cohesion in English. English Language Series. Longman Group Ltd.Google Scholar
- Alistair Knott, Jon Oberlander, Mick O'Donnell, and Chris Mellish. 2001. Beyond elaboration: The interaction of relations and focus in coherent text. In Ted Sanders, Joost Schilperoord, and Wilbert Spooren, editors, Text representation: linguistic and psycholinguistic aspects, pages 181--196. Benjamins.Google Scholar
- Inderjeet Mani. 2001. Automatic Summarization. Natural Language Processing. John Benjamins Publishing Company.Google Scholar
- William C. Mann and Sandra A. Thompson. 1988. Rhetorical structure theory: Toward a functional theory of text organisation. Text, 3(8):234--281.Google Scholar
- Daniel Marcu. 1997. The Rhetorical Parsing, Summarization and Generation of Natural Language Texts. Ph.D. thesis, Department of Computer Science, University of Toronto, Toronto, Canada. Google ScholarDigital Library
- Daniel Marcu. 1999. The automatic construction of large-scale corpora for summarization research. In SIGIR-99. 2002. MEADeval. http://perun.si.umich.edu/clair/meadeval/. Google ScholarDigital Library
- Jane Morris and Graeme Hirst. 1991. Lexical cohesion, the thesaurus, and the structure of text. Computational linguistics, 17(1):21--48. Google ScholarDigital Library
- M. Palomar, A. Ferrández, L. Moreno, P. Martínez-Barco, J. Peral, M. Saiz-Noeda, and R. Mu noz. 2001. An algorithm for anaphora resolution in spanish texts. Computational Linguistics, 27(4). Google ScholarDigital Library
- Livia Polanyi. 1988. A formal model of the structure of discourse. Journal of Pragmatics, 12:601--638.Google ScholarCross Ref
- R. Schank and R. Abelson. 1977. Scripts, Plans, Goals, and Understanding. Lawrence Erlbaum, Hillsdale, NJ.Google Scholar
- SweSum. 2002. http://www.nada.kth.se/~xmartin/swesum/index-eng.html.Google Scholar
- Arie Verhagen. 2001. Subordination and discourse segmentation revisited, or: Why matrix clauses may be more dependent than complements. In Ted Sanders, Joost Schilperoord, and Wilbert Spooren, editors, Text Representation. Linguistic and psychological aspects, pages 337--357. John Benjamins.Google Scholar
- Piek Vossen, editor. 1998. Euro WordNet: a multilingual database with lexical semantic networks. Kluwer Academic Publishers. Google ScholarDigital Library
- Integrating cohesion and coherence for automatic summarization
Recommendations
Lexical cohesion based topic modeling for summarization
CICLing'08: Proceedings of the 9th international conference on Computational linguistics and intelligent text processingIn this paper, we attack the problem of forming extracts for text summarization. Forming extracts involves selecting the most representative and significant sentences from the text. Our method takes advantage of the lexical cohesion structure in the ...
Automatic Extractive Text Summarization using Multiple Linguistic Features
Automatic text summarization (ATS) provides a summary of distinct categories of information using natural language processing (NLP). Low-resource languages like Hindi have restricted applications of these techniques. This study proposes a method for ...
Towards content-level coherence with aspect-guided summarization
The TAC 2010 summarization track initiated a new task—aspect-guided summarization—that centers on textual aspects embodied as particular kinds of information of a text. We observe that aspect-guided summaries not only address highly specific user need, ...
Comments