skip to main content
10.3115/1220575.1220606dlproceedingsArticle/Chapter ViewAbstractPublication PageshltConference Proceedingsconference-collections
Article
Free Access

Automatically learning cognitive status for multi-document summarization of newswire

Published:06 October 2005Publication History

ABSTRACT

Machine summaries can be improved by using knowledge about the cognitive status of news article referents. In this paper, we present an approach to automatically acquiring distinctions in cognitive status using machine learning over the forms of referring expressions appearing in the input. We focus on modeling references to people, both because news often revolve around people and because existing natural language tools for named entity identification are reliable. We examine two specific distinctions---whether a person in the news can be assumed to be known to a target audience (hearer-old vs hearer-new) and whether a person is a major character in the news story. We report on machine learning experiments that show that these distinctions can be learned with high accuracy, and validate our approach using human subjects.

References

  1. R. Barzilay. 2003. Information Fusion for Multidocument Summarization: Paraphrasing and Generation. Ph.D. thesis, Columbia University, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Bikel, R. Schwartz, and R. Weischedel. 1999. An algorithm that learns what's in a name. Machine Learning, 34:211--231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Carletta. 1996. Assessing agreement on classification tasks: The kappa statistic. Computational Linguistics, 22(2):249--254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. H. Daumé III, A. Echihabi, D. Marcu, D. S. Munteanu, and R. Soricut. 2002. GLEANS: A generator of logical extracts and abstracts for nice summaries. In Proceedings of the Second Document Understanding Conference (DUC 2002), pages 9--14, Philadelphia, PA.Google ScholarGoogle Scholar
  5. P. Gordon, B. Grosz, and L. Gilliom. 1993. Pronouns, names, and the centering of attention in discourse. Cognitive Science, 17:311--347.Google ScholarGoogle ScholarCross RefCross Ref
  6. H. P. Grice. 1975. Logic and conversation. In P. Cole and J. L. Morgan, editors, Syntax and semantics, volume 3, pages 43--58. Academic Press.Google ScholarGoogle Scholar
  7. B. Grosz and C. Sidner. 1986. Attention, intentions, and the structure of discourse. Computational Linguistics, 3(12):175--204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. Grosz, A. Joshi, and S. Weinstein. 1995. Centering: A framework for modelling the local coherence of discourse. Computational Linguistics, 21(2):203--226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Grover, C. Matheson, A. Mikheev, and M. Moens. 2000. Lt ttt: A flexible tokenization toolkit. In Proceedings of LREC'00.Google ScholarGoogle Scholar
  10. J. Gundel, N. Hedberg, and R. Zacharski. 1993. Cognitive status and the form of referring expressions in discourse. Language, 69:274--307.Google ScholarGoogle ScholarCross RefCross Ref
  11. Y. Guo, X. Huang, and L. Wu. 2003. Approaches to event-focused summarization based on named entities and query words. In Document Understanding Conference (DUC'03).Google ScholarGoogle Scholar
  12. K. Knight and D. Marcu. 2000. Statistics-based summarization---step one: Sentence compression. In Proceeding of The American Association for Artificial Intelligence Conference (AAAI-2000), pages 703--710. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. H. P. Luhn. 1958. The automatic creation of literature abstracts. IBM Journal of Research and Development, 2(2):159--165.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. I. Mani and M. Maybury, editors. 1999. Advances in Automatic Text Summarization. MIT Press, Cambridge, Massachusetts. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Nenkova and K. McKeown. 2003. References to named entities: a corpus study. In Proceedings of HLT/NAACL 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. D. Paice. 1990. Constructing literature abstracts by computer: techniques and prospects. Inf. Process. Manage., 26(1):171--186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Poesio and R. Vieira. 1998. A corpus-based investigation of definite description use. Computational Linguistics, 24(2):183--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. E. Prince. 1992. The zpg letter: subject, definiteness, and information status. In S. Thompson and W. Mann, editors, Discourse description: diverse analyses of a fund raising text, pages 295--325. John Benjamins.Google ScholarGoogle Scholar
  19. D. Radev and K. McKeown. 1998. Generating natural language summaries from multiple on-line sources. Computational Linguistics, 24(3):469--500. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Saggion and R. Gaizaukas. 2004. Multi-document summarization by cluster/profile relevance and redundancy removal. In Document Understanding Conference (DUC04).Google ScholarGoogle Scholar
  21. A. Sanford, K. Moar, and S. Garrod. 1988. Proper names as controllers of discourse focus. Language and Speech, 31(1):43--56.Google ScholarGoogle ScholarCross RefCross Ref
  22. A. Siddharthan, A. Nenkova, and K. McKeown. 2004. Syntactic simplification for improving content selection in multi-document summarization. In Proceedings of the 20th International Conference on Computational Linguistics (COLING 2004), pages 896--902, Geneva, Switzerland. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Siddharthan. 2003. Syntactic simplification and Text Cohesion. Ph.D. thesis, University of Cambridge, UK.Google ScholarGoogle Scholar
  24. I. Witten and E. Frank. 2005. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Automatically learning cognitive status for multi-document summarization of newswire

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image DL Hosted proceedings
        HLT '05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
        October 2005
        1054 pages

        Publisher

        Association for Computational Linguistics

        United States

        Publication History

        • Published: 6 October 2005

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate93of335submissions,28%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader