research-article

Towards content-level coherence with aspect-guided summarization

Authors:
Renxian Zhang

The Hong Kong Polytechnic University, Hong Kong and The Hong Kong Polytechnic University Shenzhen Research Institute, China

The Hong Kong Polytechnic University, Hong Kong and The Hong Kong Polytechnic University Shenzhen Research Institute, China
View Profile

,
Wenjie Li

The Hong Kong Polytechnic University, Hong Kong and The Hong Kong Polytechnic University Shenzhen Research Institute, China

The Hong Kong Polytechnic University, Hong Kong and The Hong Kong Polytechnic University Shenzhen Research Institute, China
View Profile

,
Dehong Gao

The Hong Kong Polytechnic University, Hong Kong and The Hong Kong Polytechnic University Shenzhen Research Institute, China

The Hong Kong Polytechnic University, Hong Kong and The Hong Kong Polytechnic University Shenzhen Research Institute, China
View Profile

ACM Transactions on Speech and Language Processing Volume 10 Issue 1Article No.: 2pp 1–22https://doi.org/10.1145/2442076.2442078

Published:22 March 2013Publication History

ACM Transactions on Speech and Language Processing

Abstract

The TAC 2010 summarization track initiated a new task—aspect-guided summarization—that centers on textual aspects embodied as particular kinds of information of a text. We observe that aspect-guided summaries not only address highly specific user need, but also facilitate content-level coherence by using aspect information. In this article, we present a full-fledged approach to aspect-guided summarization with a focus on summary coherence. Our summarization approach depends on two prerequisite subtasks: recognizing aspect-bearing sentences in order to do sentence extraction, and modeling aspect-based coherence with an HMM model in order to predict a coherent sentence ordering. Using the manually annotated TAC 2010 and 2010 datasets, we validated the effectiveness of our proposed methods for those subtasks. Drawing on the empirical results, we proceed to develop an aspect-guided summarizer based on a simple but robust base summarizer. With sentence selection guided by aspect information, our system is one of the best on TAC 2011. With sentence ordering predicted by the aspect-based HMM model, the summaries achieve good coherence.

References

Barzilay, R. and Lapata, M. 2008. Modeling local coherence: An entity-based approach. Comput. Linguist. 34, 1--34. Google ScholarDigital Library
Barzilay, R., and Lee, L. 2004. Catching the drift: Probabilistic content models, with applications to generation and summarization. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL). 113--120.Google Scholar
Blei, D. M., Ng, A. Y., and Jordan, M. I. 2003. Latent dirichlet allocation. J. Mach. Learn. Res. 3, 4--5, 993--1022. Google ScholarDigital Library
Boutell, M. R., Luo, J., Shen, X., and Brown, C. M. 2004. Learning multi-label scene classification. Pattern Recogn. 37, 9, 1757--71.Google ScholarCross Ref
Daumé III, H. and Marcu, D. 2006. Bayesian query-focused summarization. In Proceedings of the Meeting of the Association of Computational Linguistics (ACL). 305--312. Google ScholarDigital Library
Elsner, M., Austerweil, J. and Charniak, E. 2007. A unified local and global model for discourse coherence. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL). 436--443.Google Scholar
Fuentes, M., Alfonseca, E., and Rodríguez, H. 2007. Support vector machines for query-focused summarization trained and evaluated on pyramid data. In Proceedings of the 45^th Annual Meeting of the Association for Computational Linguistics (Companion Volume Proceedings of the Demo and Poster Sessions). 57--60. Google ScholarDigital Library
Genest, P. and Lapalme, G. 2010. Text generation for abstractive summarization. In Proceedings of the 3^rd Text Analysis Conference. National Institute of Standards and Technology.Google Scholar
Ji, H., Favre, B., Lin, W., Gillick, D., Hakkani-Tur, D., and Grishman, R. 2011. Open-Domain multi-document summarization via information extraction: Challenges and prospects. In Multi-Source, Multilingual Information Extraction and Summarisation Volume of Theory and Applications of Natural Language Processing. Springer.Google Scholar
Joachims, T. 1999. Transductive inference for text classification using support vector machines. In Proceedings of the 16^th International Conference on Machine Learning (ICML'99). Google ScholarDigital Library
Klein, D., and Manning, C. D. 2003. Accurate unlexicalized parsing. In Proceedings of the 41^st Meeting of the Association for Computational Linguistics. 423--430. Google ScholarDigital Library
Lapata, M. 2003. Probabilistic text structuring: Experiments with sentence ordering. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL). 545--552. Google ScholarDigital Library
Lapata, M. 2006. Automatic evaluation of information ordering: Kendall's tau. Comput. Linguist. 32, 4, 1--14. Google ScholarDigital Library
Li, P., Wang, Y., Gao, W., and Jiang, J. 2011. Generating aspect-oriented multi-document summarization with event-aspect model. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1137--1146. Google ScholarDigital Library
Li, W., Li, W., and Lu, Q. 2006. Mining implicit entities in queries. In Proceedings of the 5^th International Conference on Language Resources and Evaluation (LREC'06). 24--26.Google Scholar
Lin, C.-Y. and Hovy, E. 2003. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the Human Technology Conference (HLT-NAACL'03). 71--78. Google ScholarDigital Library
Mani, I. and Bloedorn E. 1999. Summarizing similarities and differences among related documents. Inf. Retr. 1, 35--67. Google ScholarDigital Library
Mcknight, L. and Srinivasan, P. 2003. Categorization of sentence types in medical abstracts. In Proceedings of the American Medical Informatics Association Annual Symposium. 440--444.Google Scholar
Owczarzak, K. and Dang, H. T. 2011. Who wrote what where: Analyzing the content of human and automatic summaries. In Proceedings of the Workshop on Automatic Summarization for Different Genres, Media, and Languages. 25--32. Google ScholarDigital Library
Patwardhan, S. 2010. Widening the field of view of information extraction through sentential event recognition. Ph.D. dissertation, The University of Utah. Google ScholarDigital Library
Rand, W. M. 1971. Objective criteria for the evaluation of clustering methods. J. Amer. Statist. Assoc. 66, 336, 846--850.Google ScholarCross Ref
Riloff, E. 1996. Automatically generating extraction patterns from untagged text. In Proceedings of the 13^th National Conference on Artificial Intelligence. 1044--1049. Google ScholarDigital Library
Schilder, F. and Kondadadi, R. 2008. FastSum: Fast and accurate query-based multi-document summarization. In Proceedings of the Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL'08: HLT). 205--208. Google ScholarDigital Library
Stevenson, M. and Greenwood, M. A. 2005. A semantic approach to ie pattern recognition. In Proceedings of the 43^rd Annual Meeting of the Association of Computational Linguistics (ACL). 379--386. Google ScholarDigital Library
Teufel, S. and Moens, M. 1999. Argumentative classification of extracted sentences as a first step towards flexible abstracting. In Advances in Automatic Text Summarization, I. Mani and M. T. Maybury, Eds., MIT Press, Cambridge, MA, 155--171.Google Scholar
Teufel, S. and Moens, M. 2002. Summarizing scientific articles: Experiments with relevance and rhetorical status. Comput. Linguist. 28, 4, 409--445. Google ScholarDigital Library
Tsoumakas, G. and Katakis, I. 2007. Multi label classification: An overview. Int. J. Data Warehous. Min. 3, 3, 1--13.Google ScholarCross Ref
Vanderwende, L., Suzuki, H., Brockett, C., and Nenkova, A. 2007. Beyond sumbasic: Task-Focused summarization with sentence simplification and lexical expansion. Inf. Process. Manag. 43, 6, 1606--1618. Google ScholarDigital Library
Vapnik, V. 1998. Statistical Learning Theory. John Wiley & Sons, New York.Google Scholar
Wan, X., Yang, J., and Xiao, J. 2007. Towards a unified approach based on affinity graph to various multi-document summarizations. In Proceedings of the 11^th European Conference. 297--308. Google ScholarDigital Library
Wang, L., Shen, X., and Pan, W. 2007. On transductive support vector machines. In Prediction and Discovery, J. Verducci, X. Shen, and J. Lafferty, Eds., American Mathematical Society.Google Scholar
Yangarber, R. 2003. Counter-Training in the discovery of semantic patterns. In Proceedings of the 41^st Annual Meeting of the Association for Computational Linguistics (ACL'03). 343--350. Google ScholarDigital Library
Zhang, R., Ouyang, Y., and Li, W. 2011. Guided summarization with aspect recognition. In Proceedings of Textual Analysis Conference (TAC'11).Google Scholar
Zhou, L., Ticrea, M., and Hovy, E. 2004. Multidocument biography summarization. In Proceedings of Empirical Methods in Natural Language Processing (EMNLP'04). 434--441.Google Scholar

Index Terms

Towards content-level coherence with aspect-guided summarization
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

Towards coherent single-document summarization: an integer linear programming-based approach
SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied Computing

Automatic Text Summarization (ATS) is a viable option to reduce the content of textual documents, e.g., as a possible preprocessing step in many text mining applications. Single-document extractive summarizers have been developed based on different ...
Read More
Applying two-level reinforcement ranking in query-oriented multidocument summarization

Sentence ranking is the issue of most concern in document summarization today. While traditional feature-based approaches evaluate sentence significance and rank the sentences relying on the features that are particularly designed to characterize the ...
Read More
Intertopic information mining for query-based summarization

In this article, the authors address the problem of sentence ranking in summarization. Although most existing summarization approaches are concerned with the information embodied in a particular topic (including a set of documents and an associated ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Speech and Language Processing Volume 10, Issue 1
March 2013
50 pages
ISSN:1550-4875
EISSN:1550-4883
DOI:10.1145/2442076
Issue’s Table of Contents

Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 March 2013
- Revised: 1 January 2013
- Accepted: 1 January 2013
- Received: 1 February 2012
Published in tslp Volume 10, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Summarization
coherence
content model
textual aspect
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 256
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Towards content-level coherence with aspect-guided summarization

ACM Transactions on Speech and Language Processing

Abstract

References

Cited By

Index Terms

Recommendations

Towards coherent single-document summarization: an integer linear programming-based approach

Applying two-level reinforcement ranking in query-oriented multidocument summarization

Intertopic information mining for query-based summarization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Towards content-level coherence with aspect-guided summarization

ACM Transactions on Speech and Language Processing

Abstract

References

Cited By

Index Terms

Recommendations

Towards coherent single-document summarization: an integer linear programming-based approach

Applying two-level reinforcement ranking in query-oriented multidocument summarization

Intertopic information mining for query-based summarization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media