The REG Summarization System with Question Reformulation at QA@INEX Track 2010

Vivaldi, Jorge; da Cunha, Iria; Ramírez, Javier

doi:10.1007/978-3-642-23577-1_27

Jorge Vivaldi²⁰,
Iria da Cunha²⁰ &
Javier Ramírez²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6932))

Included in the following conference series:

International Workshop of the Initiative for the Evaluation of XML Retrieval

399 Accesses
1 Citations

Abstract

In this paper we present REG, a graph approach to study a fundamental problem of Natural Language Processing: the automatic summarization of documents. The algorithm models a document as a graph, to obtain weighted sentences. We applied this approach to the INEX@QA 2010 task (question-answering). To do it, we have extracted the terms and name entities from the queries, in order to obtain a list of terms and name entities related with the main topic of the question. Using this strategy, REG obtained good results regarding performance (measured with the automatic evaluation system FRESA) and readability (measured with human evaluation), being one of the seven best systems into the task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abracos, J., Lopes, G.: Statistical methods for retrieving most significant paragraphs in newspaper articles. In: Proceedings of the ACL/EACL 1997 Workshop on Intelligent Scalable Text Summarization, Madrid, pp. 51–57 (1997)
Google Scholar
Afantenos, S., Karkaletsis, V., Stamatopoulos, P.: Summarization of medical documents: A survey. Artificial Intelligence in Medicine 33(2), 157–177 (2005)
Article Google Scholar
Barrón-Cedeño, A., Sierra, G., Drouin, P., Ananiadou, S.: An Improved Automatic Term Recognition Method for Spanish. In: Gelbukh, A. (ed.) CICLing 2009. LNCS, vol. 5449, pp. 125–136. Springer, Heidelberg (2009)
Chapter Google Scholar
Bourigault, D., Jacquemin, C.: Term Extraction + Term Clustering: an integrated platform for computer-aided terminology. In: Proceedings of EACL, pp. 15–22 (1999)
Google Scholar
Cabré, M.T.: La terminología. Representación y comunicación. IULA-UPF, Barcelona (1999)
Google Scholar
Cabré, M.T., Estopà, R., Vivaldi, J.: Automatic term detection: a review of current systems. In: Bourigault, D., Jacquemin, C., L’Homme, M.C. (eds.) Recent Advances in Computational Terminology, pp. 53–87. John Benjamins, Amsterdam (2001)
Chapter Google Scholar
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to algorithms, 2nd edn. The MIT Press, Cambridge (2005)
MATH Google Scholar
da Cunha, I., Wanner, L., Cabré, M.T.: Summarization of specialized discourse: The case of medical articles in Spanish. Terminology 13(2), 249–286 (2007)
Article Google Scholar
Edmunson, H.P.: New Methods in Automatic Extraction. Journal of the Association for Computing Machinery 16, 264–285 (1969)
Article Google Scholar
Farzindar, A., Lapalme, G., Desclés, J.-P.: Résumé de textes juridiques par identification de leur structure thématique. Traitement Automatique des Langues 45(1), 39–64 (2004)
Google Scholar
Fuentes, M., Gonzalez, E., Rodriguez, H.: Resumidor de noticies en catala del projecte Hermes. In: Proceedings of II Congrés d’Enginyeria en Llengua Catalana (CELC 2004), Andorra, pp. 102–102 (2004)
Google Scholar
Gaizauskas, R., Herring, P., Oakes, M., Beaulieu, M., Willett, P., Fowkes, H., Jonsson, A.: Intelligent access to text: Integrating information extraction technology into text browsers. In: Proceedings of the Human Language Technology Conference, San Diego, pp. 189–193 (2001)
Google Scholar
Johnson, D.B., Zou, Q., Dionisio, J.D., Liu, V.Z., Chu, W.W.: Modeling medical content for automated summarization. Annals of the New York Academy of Sciences 980, 247–258 (2002)
Article Google Scholar
Jun’ichi, K., Kentaro, T.: Exploiting Wikipedia as External Knowledge for Name Entity Recognition. In: Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 698–707 (2007)
Google Scholar
Kageura, K., Umino, B.: Methods of automatic term recognition: A review. Terminology 3(2), 259–289 (1996)
Article Google Scholar
Lal, P., Reger, S.: Extract-based Summarization with Simplication. In: Proceedings of the 2nd Document Understanding Conference at the 40th Meeting of the Association for Computational Linguistics, pp. 90–96 (2002)
Google Scholar
Leong Chieu, H., Tou Ng, H.: Named entity recognition: a maximum entropy approach using global information. In: Proceedings of the 19th International Conference on Computational Linguistics, pp. 1-7 (2002)
Google Scholar
Lin, C.-Y.: ROUGE: A Package for Automatic Evaluation of Summaries. In: Proceedings of Text Summarization Branches Out: ACL 2004 Workshop, pp. 74–81 (2004)
Google Scholar
Nanba, H., Okumura, M.: Producing More Readable Extracts by Revising Them. In: Proceedings of the 18th International Conference on Computational Linguistics (COLING 2000), Saarbrucken, pp. 1071–1075 (2000)
Google Scholar
Ono, K., Sumita, K., Miike, S.: Abstract generation based on rhetorical structure extraction. In: Proceedings of the International Conference on Computational Linguistics, Kyoto, pp. 344–348 (1994)
Google Scholar
Paice, C.D.: Constructing literature abstracts by computer: Techniques and prospects. Information Processing and Management 26, 171–186 (1990)
Article Google Scholar
Pazienza, M.T., Pennacchiotti, M., Zanzotto, F.M.: Terminology Extraction: An Analysis of Linguistic and Statistical Approaches. In: Studies in Fuzziness and Soft Computing, vol. 185, pp. 255–279 (2005)
Google Scholar
Pearson, J.: Terms in context. John Benjamin, Amsterdam (1998)
Book Google Scholar
Radev, D.: Language Reuse and Regeneration: Generating Natural Language Summaries from Multiple On-Line Sources. New York, Columbia University [PhD Thesis] (1999)
Google Scholar
Sager, J.C.: In search of a foundation: Towards a theory of terms. Terminology 5(1), 41–57 (1999)
Article Google Scholar
Saggion, H., Lapalme, G.: Generating Indicative-Informative Summaries with SumUM. Computational Linguistics 28(4), 497–526 (2002)
Article Google Scholar
Saggion, H., Torres-Moreno, J.-M., da Cunha, I., SanJuan, E., Velázquez-Morales, P., SanJuan, E.: Multilingual Summarization Evaluation without Human Models. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), Pekin (2010)
Google Scholar
SanJuan, E., Bellot, P., Moriceau, V., Tannier, X.: Overview of the 2010 QA Track: Preliminary results. In: Geva, S., et al. (eds.) INEX 2010. LNCS, vol. 6932, pp. 269–281. Springer, Heidelberg (2010)
Google Scholar
Sclano, F., Velardi, P.: Termextractor: a web application to learn the shared terminology of emergent web communities. In: Proceedings of the 3rd International Conference on Interoperability for Enterprise Software and Applications, pp. 287–298 (2007)
Google Scholar
Torres-Moreno, J.-M., Saggion, H., da Cunha, I., SanJuan, E., Velázquez-Morales, P., SanJuan, E.: Summary Evaluation With and Without References. Polibitis: Research Journal on Computer Science and Computer Engineering with Applications 42 (2010a)
Google Scholar
Torres-Moreno, J.-M., Saggion, H., da Cunha, I., Velázquez-Morales, P., SanJuan, E.: Ealuation automatique de résumés avec et sans référence. In: Proceedings of the 17e Conférence sur le Traitement Automatique des Langues Naturelles (TALN), Université de Montréal et Ecole Polytechnique de Montréal, Montreal Canada (2010)
Google Scholar
Torres-Moreno, J-M., Ramírez, J.: REG: un algorithme glouton appliqué au résumé automatique de texte. In: JADT 2010, Roma, Italia (2010)
Google Scholar
Torres-Moreno, J-M., Ramírez, J.: Un resumeur a base de graphes, indépendant de la langue. In: Proceedings of the International Workshop African HLT 2010, Djibouti (2010)
Google Scholar
Torres-Moreno, J.M., Velázquez-Morales, P., Meunier, J.G.: Condensés de textes par des méthodes numériques. In: Proceedings of the 6th International Conference on the Statistical Analysis of Textual Data (JADT), St. Malo, pp. 723–734 (2002)
Google Scholar
Vivaldi, J., da Cunha, I., Torres-Moreno, J.M., Velázquez, P.: Automatic Summarization Using Terminological and Semantic Resources. In: En Actas del 7th International Conference on Language Resources and Evaluation (LREC 2010), Valletta, Malta (2010)
Google Scholar
Vivaldi, J.: Extracción de candidatos a término mediante combinación de estrategias heterogéneas. Ph.D. thesis, Universitat Politcnica de Catalunya, Barcelona (2001)
Google Scholar
Vivaldi, J., Rodríguez, H.: Improving term extraction by combining different techniques. Terminology 7(1), 31–47 (2001a)
Article Google Scholar
Vivaldi, J., Màrquez, L., Rodríguez, H.: Improving term extraction by system combination using boosting. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 515–526. Springer, Heidelberg (2001b)
Chapter Google Scholar
Volk, M., Clematide, S.: Learn-filter-apply-forget. Mixed approaches to name entity recognition. In: Proceedings of the 6th International Workshop on Applications of Natural Language for Informations Systems, Madrid, Spain (2001)
Google Scholar
Won, W., Liu, W., Bennamoun, M.: Determination of Unithood and Termhood for Term Recognition. In: Song, M., Wu, Y. (eds.) Handbook of Research on Text and Web Mining Technologies. IGI Global (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Instituto Universitario de Lingüística Aplicada - UPF, Barcelona, Spain
Jorge Vivaldi & Iria da Cunha
Universidad Autónoma Metropolitana-Azcapotzalco, Mexico
Javier Ramírez

Authors

Jorge Vivaldi
View author publications
You can also search for this author in PubMed Google Scholar
Iria da Cunha
View author publications
You can also search for this author in PubMed Google Scholar
Javier Ramírez
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Science and Technology, Queensland University of Technology, GPO Box 2434, Qld 4001, Brisbane, Australia
Shlomo Geva
Archives and Information Studies/Humanities, University of Amsterdam, Turfdraagsterpad 9, 1012XT, Amsterdam, The Netherlands
Jaap Kamps
Multimodal Computing and Interaction, Saarland University, 66123, Saarbrücken, Germany
Ralf Schenkel
Department of Computer Science, University of Otago, P.O. Box 56, 9054, Dunedin, New Zealand
Andrew Trotman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vivaldi, J., da Cunha, I., Ramírez, J. (2011). The REG Summarization System with Question Reformulation at QA@INEX Track 2010. In: Geva, S., Kamps, J., Schenkel, R., Trotman, A. (eds) Comparative Evaluation of Focused Retrieval. INEX 2010. Lecture Notes in Computer Science, vol 6932. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23577-1_27

Download citation

DOI: https://doi.org/10.1007/978-3-642-23577-1_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23576-4
Online ISBN: 978-3-642-23577-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics