Article

Free Access

Automatically learning cognitive status for multi-document summarization of newswire

Authors:
Ani Nenkova

Columbia University

Columbia University
View Profile

,
Advaith Siddharthan

Columbia University

Columbia University
View Profile

,
Kathleen McKeown

Columbia University

Columbia University
View Profile

HLT '05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language ProcessingOctober 2005Pages 241–248https://doi.org/10.3115/1220575.1220606

Published:06 October 2005Publication History

HLT '05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing

Pages 241–248

ABSTRACT

Machine summaries can be improved by using knowledge about the cognitive status of news article referents. In this paper, we present an approach to automatically acquiring distinctions in cognitive status using machine learning over the forms of referring expressions appearing in the input. We focus on modeling references to people, both because news often revolve around people and because existing natural language tools for named entity identification are reliable. We examine two specific distinctions---whether a person in the news can be assumed to be known to a target audience (hearer-old vs hearer-new) and whether a person is a major character in the news story. We report on machine learning experiments that show that these distinctions can be learned with high accuracy, and validate our approach using human subjects.

References

R. Barzilay. 2003. Information Fusion for Multidocument Summarization: Paraphrasing and Generation. Ph.D. thesis, Columbia University, New York. Google ScholarDigital Library
D. Bikel, R. Schwartz, and R. Weischedel. 1999. An algorithm that learns what's in a name. Machine Learning, 34:211--231. Google ScholarDigital Library
J. Carletta. 1996. Assessing agreement on classification tasks: The kappa statistic. Computational Linguistics, 22(2):249--254. Google ScholarDigital Library
H. Daumé III, A. Echihabi, D. Marcu, D. S. Munteanu, and R. Soricut. 2002. GLEANS: A generator of logical extracts and abstracts for nice summaries. In Proceedings of the Second Document Understanding Conference (DUC 2002), pages 9--14, Philadelphia, PA.Google Scholar
P. Gordon, B. Grosz, and L. Gilliom. 1993. Pronouns, names, and the centering of attention in discourse. Cognitive Science, 17:311--347.Google ScholarCross Ref
H. P. Grice. 1975. Logic and conversation. In P. Cole and J. L. Morgan, editors, Syntax and semantics, volume 3, pages 43--58. Academic Press.Google Scholar
B. Grosz and C. Sidner. 1986. Attention, intentions, and the structure of discourse. Computational Linguistics, 3(12):175--204. Google ScholarDigital Library
B. Grosz, A. Joshi, and S. Weinstein. 1995. Centering: A framework for modelling the local coherence of discourse. Computational Linguistics, 21(2):203--226. Google ScholarDigital Library
C. Grover, C. Matheson, A. Mikheev, and M. Moens. 2000. Lt ttt: A flexible tokenization toolkit. In Proceedings of LREC'00.Google Scholar
J. Gundel, N. Hedberg, and R. Zacharski. 1993. Cognitive status and the form of referring expressions in discourse. Language, 69:274--307.Google ScholarCross Ref
Y. Guo, X. Huang, and L. Wu. 2003. Approaches to event-focused summarization based on named entities and query words. In Document Understanding Conference (DUC'03).Google Scholar
K. Knight and D. Marcu. 2000. Statistics-based summarization---step one: Sentence compression. In Proceeding of The American Association for Artificial Intelligence Conference (AAAI-2000), pages 703--710. Google ScholarDigital Library
H. P. Luhn. 1958. The automatic creation of literature abstracts. IBM Journal of Research and Development, 2(2):159--165.Google ScholarDigital Library
I. Mani and M. Maybury, editors. 1999. Advances in Automatic Text Summarization. MIT Press, Cambridge, Massachusetts. Google ScholarDigital Library
A. Nenkova and K. McKeown. 2003. References to named entities: a corpus study. In Proceedings of HLT/NAACL 2003. Google ScholarDigital Library
C. D. Paice. 1990. Constructing literature abstracts by computer: techniques and prospects. Inf. Process. Manage., 26(1):171--186. Google ScholarDigital Library
M. Poesio and R. Vieira. 1998. A corpus-based investigation of definite description use. Computational Linguistics, 24(2):183--216. Google ScholarDigital Library
E. Prince. 1992. The zpg letter: subject, definiteness, and information status. In S. Thompson and W. Mann, editors, Discourse description: diverse analyses of a fund raising text, pages 295--325. John Benjamins.Google Scholar
D. Radev and K. McKeown. 1998. Generating natural language summaries from multiple on-line sources. Computational Linguistics, 24(3):469--500. Google ScholarDigital Library
H. Saggion and R. Gaizaukas. 2004. Multi-document summarization by cluster/profile relevance and redundancy removal. In Document Understanding Conference (DUC04).Google Scholar
A. Sanford, K. Moar, and S. Garrod. 1988. Proper names as controllers of discourse focus. Language and Speech, 31(1):43--56.Google ScholarCross Ref
A. Siddharthan, A. Nenkova, and K. McKeown. 2004. Syntactic simplification for improving content selection in multi-document summarization. In Proceedings of the 20th International Conference on Computational Linguistics (COLING 2004), pages 896--902, Geneva, Switzerland. Google ScholarDigital Library
A. Siddharthan. 2003. Syntactic simplification and Text Cohesion. Ph.D. thesis, University of Cambridge, UK.Google Scholar
I. Witten and E. Frank. 2005. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco. Google ScholarDigital Library

Automatically learning cognitive status for multi-document summarization of newswire
1. Computing methodologies
  1. Artificial intelligence
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Exploiting the Role of Named Entities in Query-Oriented Document Summarization
PRICAI '08: Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence

In this paper, we exploit the role of named entities in measuring document/query sentence relevance in query-oriented extractive summarization. Named entity driven associations are defined as the informative, semantic-sensitive text bi-grams consisting ...
Read More
Toward opinion summarization: linking the sources
SST '06: Proceedings of the Workshop on Sentiment and Subjectivity in Text

We target the problem of linking source mentions that belong to the same entity (source coreference resolution), which is needed for creating opinion summaries. In this paper we describe how source coreference resolution can be transformed into standard ...
Read More
Graph Representation Learning in Document Wikification
Document Analysis and Recognition – ICDAR 2021 Workshops
Abstract
Wikification (entity annotation) is a challenging task in Natural Language Processing (NLP). It is a method to automatically enrich a text with links to Wikipedia as a knowledge base. Wikification starts from detecting ambiguous mentions in the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
HLT '05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
October 2005
1054 pages
Conference Chair:
Raymond J. Mooney
The University of Texas at Austin
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 6 October 2005
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate93of335submissions,28%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 207
  Total Downloads
- Downloads (Last 12 months)16
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Automatically learning cognitive status for multi-document summarization of newswire

HLT '05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing

ABSTRACT

References

Cited By

Recommendations

Exploiting the Role of Named Entities in Query-Oriented Document Summarization

Toward opinion summarization: linking the sources

Graph Representation Learning in Document Wikification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Automatically learning cognitive status for multi-document summarization of newswire

HLT '05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing

ABSTRACT

References

Cited By

Recommendations

Exploiting the Role of Named Entities in Query-Oriented Document Summarization

Toward opinion summarization: linking the sources

Graph Representation Learning in Document Wikification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media