ABSTRACT
Machine summaries can be improved by using knowledge about the cognitive status of news article referents. In this paper, we present an approach to automatically acquiring distinctions in cognitive status using machine learning over the forms of referring expressions appearing in the input. We focus on modeling references to people, both because news often revolve around people and because existing natural language tools for named entity identification are reliable. We examine two specific distinctions---whether a person in the news can be assumed to be known to a target audience (hearer-old vs hearer-new) and whether a person is a major character in the news story. We report on machine learning experiments that show that these distinctions can be learned with high accuracy, and validate our approach using human subjects.
- R. Barzilay. 2003. Information Fusion for Multidocument Summarization: Paraphrasing and Generation. Ph.D. thesis, Columbia University, New York. Google ScholarDigital Library
- D. Bikel, R. Schwartz, and R. Weischedel. 1999. An algorithm that learns what's in a name. Machine Learning, 34:211--231. Google ScholarDigital Library
- J. Carletta. 1996. Assessing agreement on classification tasks: The kappa statistic. Computational Linguistics, 22(2):249--254. Google ScholarDigital Library
- H. Daumé III, A. Echihabi, D. Marcu, D. S. Munteanu, and R. Soricut. 2002. GLEANS: A generator of logical extracts and abstracts for nice summaries. In Proceedings of the Second Document Understanding Conference (DUC 2002), pages 9--14, Philadelphia, PA.Google Scholar
- P. Gordon, B. Grosz, and L. Gilliom. 1993. Pronouns, names, and the centering of attention in discourse. Cognitive Science, 17:311--347.Google ScholarCross Ref
- H. P. Grice. 1975. Logic and conversation. In P. Cole and J. L. Morgan, editors, Syntax and semantics, volume 3, pages 43--58. Academic Press.Google Scholar
- B. Grosz and C. Sidner. 1986. Attention, intentions, and the structure of discourse. Computational Linguistics, 3(12):175--204. Google ScholarDigital Library
- B. Grosz, A. Joshi, and S. Weinstein. 1995. Centering: A framework for modelling the local coherence of discourse. Computational Linguistics, 21(2):203--226. Google ScholarDigital Library
- C. Grover, C. Matheson, A. Mikheev, and M. Moens. 2000. Lt ttt: A flexible tokenization toolkit. In Proceedings of LREC'00.Google Scholar
- J. Gundel, N. Hedberg, and R. Zacharski. 1993. Cognitive status and the form of referring expressions in discourse. Language, 69:274--307.Google ScholarCross Ref
- Y. Guo, X. Huang, and L. Wu. 2003. Approaches to event-focused summarization based on named entities and query words. In Document Understanding Conference (DUC'03).Google Scholar
- K. Knight and D. Marcu. 2000. Statistics-based summarization---step one: Sentence compression. In Proceeding of The American Association for Artificial Intelligence Conference (AAAI-2000), pages 703--710. Google ScholarDigital Library
- H. P. Luhn. 1958. The automatic creation of literature abstracts. IBM Journal of Research and Development, 2(2):159--165.Google ScholarDigital Library
- I. Mani and M. Maybury, editors. 1999. Advances in Automatic Text Summarization. MIT Press, Cambridge, Massachusetts. Google ScholarDigital Library
- A. Nenkova and K. McKeown. 2003. References to named entities: a corpus study. In Proceedings of HLT/NAACL 2003. Google ScholarDigital Library
- C. D. Paice. 1990. Constructing literature abstracts by computer: techniques and prospects. Inf. Process. Manage., 26(1):171--186. Google ScholarDigital Library
- M. Poesio and R. Vieira. 1998. A corpus-based investigation of definite description use. Computational Linguistics, 24(2):183--216. Google ScholarDigital Library
- E. Prince. 1992. The zpg letter: subject, definiteness, and information status. In S. Thompson and W. Mann, editors, Discourse description: diverse analyses of a fund raising text, pages 295--325. John Benjamins.Google Scholar
- D. Radev and K. McKeown. 1998. Generating natural language summaries from multiple on-line sources. Computational Linguistics, 24(3):469--500. Google ScholarDigital Library
- H. Saggion and R. Gaizaukas. 2004. Multi-document summarization by cluster/profile relevance and redundancy removal. In Document Understanding Conference (DUC04).Google Scholar
- A. Sanford, K. Moar, and S. Garrod. 1988. Proper names as controllers of discourse focus. Language and Speech, 31(1):43--56.Google ScholarCross Ref
- A. Siddharthan, A. Nenkova, and K. McKeown. 2004. Syntactic simplification for improving content selection in multi-document summarization. In Proceedings of the 20th International Conference on Computational Linguistics (COLING 2004), pages 896--902, Geneva, Switzerland. Google ScholarDigital Library
- A. Siddharthan. 2003. Syntactic simplification and Text Cohesion. Ph.D. thesis, University of Cambridge, UK.Google Scholar
- I. Witten and E. Frank. 2005. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco. Google ScholarDigital Library
- Automatically learning cognitive status for multi-document summarization of newswire
Recommendations
Exploiting the Role of Named Entities in Query-Oriented Document Summarization
PRICAI '08: Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial IntelligenceIn this paper, we exploit the role of named entities in measuring document/query sentence relevance in query-oriented extractive summarization. Named entity driven associations are defined as the informative, semantic-sensitive text bi-grams consisting ...
Toward opinion summarization: linking the sources
SST '06: Proceedings of the Workshop on Sentiment and Subjectivity in TextWe target the problem of linking source mentions that belong to the same entity (source coreference resolution), which is needed for creating opinion summaries. In this paper we describe how source coreference resolution can be transformed into standard ...
Graph Representation Learning in Document Wikification
Document Analysis and Recognition – ICDAR 2021 WorkshopsAbstractWikification (entity annotation) is a challenging task in Natural Language Processing (NLP). It is a method to automatically enrich a text with links to Wikipedia as a knowledge base. Wikification starts from detecting ambiguous mentions in the ...
Comments