Abstract
In this paper we present REG, a graph approach to study a fundamental problem of Natural Language Processing: the automatic summarization of documents. The algorithm models a document as a graph, to obtain weighted sentences. We applied this approach to the INEX@QA 2010 task (question-answering). To do it, we have extracted the terms and name entities from the queries, in order to obtain a list of terms and name entities related with the main topic of the question. Using this strategy, REG obtained good results regarding performance (measured with the automatic evaluation system FRESA) and readability (measured with human evaluation), being one of the seven best systems into the task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abracos, J., Lopes, G.: Statistical methods for retrieving most significant paragraphs in newspaper articles. In: Proceedings of the ACL/EACL 1997 Workshop on Intelligent Scalable Text Summarization, Madrid, pp. 51–57 (1997)
Afantenos, S., Karkaletsis, V., Stamatopoulos, P.: Summarization of medical documents: A survey. Artificial Intelligence in Medicine 33(2), 157–177 (2005)
Barrón-Cedeño, A., Sierra, G., Drouin, P., Ananiadou, S.: An Improved Automatic Term Recognition Method for Spanish. In: Gelbukh, A. (ed.) CICLing 2009. LNCS, vol. 5449, pp. 125–136. Springer, Heidelberg (2009)
Bourigault, D., Jacquemin, C.: Term Extraction + Term Clustering: an integrated platform for computer-aided terminology. In: Proceedings of EACL, pp. 15–22 (1999)
Cabré, M.T.: La terminología. Representación y comunicación. IULA-UPF, Barcelona (1999)
Cabré, M.T., Estopà, R., Vivaldi, J.: Automatic term detection: a review of current systems. In: Bourigault, D., Jacquemin, C., L’Homme, M.C. (eds.) Recent Advances in Computational Terminology, pp. 53–87. John Benjamins, Amsterdam (2001)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to algorithms, 2nd edn. The MIT Press, Cambridge (2005)
da Cunha, I., Wanner, L., Cabré, M.T.: Summarization of specialized discourse: The case of medical articles in Spanish. Terminology 13(2), 249–286 (2007)
Edmunson, H.P.: New Methods in Automatic Extraction. Journal of the Association for Computing Machinery 16, 264–285 (1969)
Farzindar, A., Lapalme, G., Desclés, J.-P.: Résumé de textes juridiques par identification de leur structure thématique. Traitement Automatique des Langues 45(1), 39–64 (2004)
Fuentes, M., Gonzalez, E., Rodriguez, H.: Resumidor de noticies en catala del projecte Hermes. In: Proceedings of II Congrés d’Enginyeria en Llengua Catalana (CELC 2004), Andorra, pp. 102–102 (2004)
Gaizauskas, R., Herring, P., Oakes, M., Beaulieu, M., Willett, P., Fowkes, H., Jonsson, A.: Intelligent access to text: Integrating information extraction technology into text browsers. In: Proceedings of the Human Language Technology Conference, San Diego, pp. 189–193 (2001)
Johnson, D.B., Zou, Q., Dionisio, J.D., Liu, V.Z., Chu, W.W.: Modeling medical content for automated summarization. Annals of the New York Academy of Sciences 980, 247–258 (2002)
Jun’ichi, K., Kentaro, T.: Exploiting Wikipedia as External Knowledge for Name Entity Recognition. In: Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 698–707 (2007)
Kageura, K., Umino, B.: Methods of automatic term recognition: A review. Terminology 3(2), 259–289 (1996)
Lal, P., Reger, S.: Extract-based Summarization with Simplication. In: Proceedings of the 2nd Document Understanding Conference at the 40th Meeting of the Association for Computational Linguistics, pp. 90–96 (2002)
Leong Chieu, H., Tou Ng, H.: Named entity recognition: a maximum entropy approach using global information. In: Proceedings of the 19th International Conference on Computational Linguistics, pp. 1-7 (2002)
Lin, C.-Y.: ROUGE: A Package for Automatic Evaluation of Summaries. In: Proceedings of Text Summarization Branches Out: ACL 2004 Workshop, pp. 74–81 (2004)
Nanba, H., Okumura, M.: Producing More Readable Extracts by Revising Them. In: Proceedings of the 18th International Conference on Computational Linguistics (COLING 2000), Saarbrucken, pp. 1071–1075 (2000)
Ono, K., Sumita, K., Miike, S.: Abstract generation based on rhetorical structure extraction. In: Proceedings of the International Conference on Computational Linguistics, Kyoto, pp. 344–348 (1994)
Paice, C.D.: Constructing literature abstracts by computer: Techniques and prospects. Information Processing and Management 26, 171–186 (1990)
Pazienza, M.T., Pennacchiotti, M., Zanzotto, F.M.: Terminology Extraction: An Analysis of Linguistic and Statistical Approaches. In: Studies in Fuzziness and Soft Computing, vol. 185, pp. 255–279 (2005)
Pearson, J.: Terms in context. John Benjamin, Amsterdam (1998)
Radev, D.: Language Reuse and Regeneration: Generating Natural Language Summaries from Multiple On-Line Sources. New York, Columbia University [PhD Thesis] (1999)
Sager, J.C.: In search of a foundation: Towards a theory of terms. Terminology 5(1), 41–57 (1999)
Saggion, H., Lapalme, G.: Generating Indicative-Informative Summaries with SumUM. Computational Linguistics 28(4), 497–526 (2002)
Saggion, H., Torres-Moreno, J.-M., da Cunha, I., SanJuan, E., Velázquez-Morales, P., SanJuan, E.: Multilingual Summarization Evaluation without Human Models. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), Pekin (2010)
SanJuan, E., Bellot, P., Moriceau, V., Tannier, X.: Overview of the 2010 QA Track: Preliminary results. In: Geva, S., et al. (eds.) INEX 2010. LNCS, vol. 6932, pp. 269–281. Springer, Heidelberg (2010)
Sclano, F., Velardi, P.: Termextractor: a web application to learn the shared terminology of emergent web communities. In: Proceedings of the 3rd International Conference on Interoperability for Enterprise Software and Applications, pp. 287–298 (2007)
Torres-Moreno, J.-M., Saggion, H., da Cunha, I., SanJuan, E., Velázquez-Morales, P., SanJuan, E.: Summary Evaluation With and Without References. Polibitis: Research Journal on Computer Science and Computer Engineering with Applications 42 (2010a)
Torres-Moreno, J.-M., Saggion, H., da Cunha, I., Velázquez-Morales, P., SanJuan, E.: Ealuation automatique de résumés avec et sans référence. In: Proceedings of the 17e Conférence sur le Traitement Automatique des Langues Naturelles (TALN), Université de Montréal et Ecole Polytechnique de Montréal, Montreal Canada (2010)
Torres-Moreno, J-M., Ramírez, J.: REG: un algorithme glouton appliqué au résumé automatique de texte. In: JADT 2010, Roma, Italia (2010)
Torres-Moreno, J-M., Ramírez, J.: Un resumeur a base de graphes, indépendant de la langue. In: Proceedings of the International Workshop African HLT 2010, Djibouti (2010)
Torres-Moreno, J.M., Velázquez-Morales, P., Meunier, J.G.: Condensés de textes par des méthodes numériques. In: Proceedings of the 6th International Conference on the Statistical Analysis of Textual Data (JADT), St. Malo, pp. 723–734 (2002)
Vivaldi, J., da Cunha, I., Torres-Moreno, J.M., Velázquez, P.: Automatic Summarization Using Terminological and Semantic Resources. In: En Actas del 7th International Conference on Language Resources and Evaluation (LREC 2010), Valletta, Malta (2010)
Vivaldi, J.: Extracción de candidatos a término mediante combinación de estrategias heterogéneas. Ph.D. thesis, Universitat Politcnica de Catalunya, Barcelona (2001)
Vivaldi, J., Rodríguez, H.: Improving term extraction by combining different techniques. Terminology 7(1), 31–47 (2001a)
Vivaldi, J., Màrquez, L., Rodríguez, H.: Improving term extraction by system combination using boosting. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 515–526. Springer, Heidelberg (2001b)
Volk, M., Clematide, S.: Learn-filter-apply-forget. Mixed approaches to name entity recognition. In: Proceedings of the 6th International Workshop on Applications of Natural Language for Informations Systems, Madrid, Spain (2001)
Won, W., Liu, W., Bennamoun, M.: Determination of Unithood and Termhood for Term Recognition. In: Song, M., Wu, Y. (eds.) Handbook of Research on Text and Web Mining Technologies. IGI Global (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vivaldi, J., da Cunha, I., Ramírez, J. (2011). The REG Summarization System with Question Reformulation at QA@INEX Track 2010. In: Geva, S., Kamps, J., Schenkel, R., Trotman, A. (eds) Comparative Evaluation of Focused Retrieval. INEX 2010. Lecture Notes in Computer Science, vol 6932. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23577-1_27
Download citation
DOI: https://doi.org/10.1007/978-3-642-23577-1_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23576-4
Online ISBN: 978-3-642-23577-1
eBook Packages: Computer ScienceComputer Science (R0)