Abstract
Nowadays discourse parsing is a very prominent research topic. However, there is not a discourse parser for Spanish texts. The first stage in order to develop this tool is discourse segmentation. In this work, we present DiSeg, the first discourse segmenter for Spanish, which uses the framework of Rhetorical Structure Theory and is based on lexical and syntactic rules. We describe the system and we evaluate its performance against a gold standard corpus, obtaining promising results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Marcu, D.: The Theory and Practice of Discourse Parsing Summarization. Institute of Technology, Massachusetts (2000a)
Marcu, D.: The Rhetorical Parsing of Unrestricted Texts: A Surface-based Approach. Computational Linguistics 26(3), 395–448 (2000b)
Sumita, K., Ono, K., Chino, T., Ukita, T., Amano, S.: A discourse structure analyzer for Japonese text. In: International Conference on Fifth Generation Computer Systems, pp. 1133–1140 (1992)
Pardo, T.A.S., Nunes, M.G.V., Rino, L.M.F.: DiZer: An Automatic Discourse Analyzer for Brazilian Portuguese. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 224–234. Springer, Heidelberg (2004)
Pardo, T.A.S., Nunes, M.G.V.: On the Development and Evaluation of a Brazilian Portuguese Discourse Parser. Journal of Theoretical and Applied Computing 15(2), 43–64 (2008)
Mann, W.C., Thompson, S.A.: Rhetorical structure theory: Toward a functional theory of text organization. Text 8(3), 243–281 (1988)
Tofiloski, M., Brooke, J., Taboada, M.: A Syntactic and Lexical-Based Discourse Segmenter. In: 47th Annual Meeting of the Association for Computational Linguistics, Singapur (2009)
Soricut, R., Marcu, D.: Sentence Level Discourse Parsing Using Syntactic and Lexical Information. In: 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Edmonton, Canada, pp. 149–156 (2003)
Mazeiro, E., Pardo, T.A.S., Nunes, M.G.V.: Identificação automática de segmentos discursivos: o uso do parser PALAVRAS. Série de Relatórios do Núcleo Interinstitucional de Lingüística Computacional (NILC). São Carlos, São Paulo (2007)
Taboada, M., Mann, W.C.: Applications of rhetorical structure theory. Discourse Studies 8(4), 567–588 (2005)
Hovy, E.: Automated discourse generation using discourse structure relations. Artificial Intelligence 63, 341–385 (1993)
Dale, R., Hovy, E., Rösner, D., Stock, O.: Aspects of Automated Natural Language Generation. Springer, Berlin (1992)
O’Donnell, M., Mellish, C., Oberlander, J., Knott, A.: ILEX: An architecture for a dynamic Hypertext generation system. Natural Language Engineering 7, 225–250 (2001)
Radev, D.: A common theory of information fusion from multiple text sources. Step one: Cross document structure. In: Dybkjær, L., Hasida, K., Traum, D. (eds.) 1st SIGdial Workshop on Discourse and Dialogue, Hong-Kong, pp. 74–83 (2000)
Pardo, T.A.S., Rino, L.H.M.: DMSumm: Review and assessment. In: Ranchhod, E., Mamede, N.J. (eds.) PorTAL 2002. LNCS (LNAI), vol. 2389, pp. 263–274. Springer, Heidelberg (2002)
Ghorbel, H., Ballim, A., Coray, G.: ROSETTA: Rhetorical and Semantic Environment for Text Alignment. In: Rayson, P., Wilson, A., McEnery, A.M., Hardie, A., Khoja, S. (eds.) Proceedings of Corpus Linguistics, Lancaster, pp. 224–233 (2001)
Marcu, D., Carlson, L., Watanabe, M.: The automatic translation of discourse structures. In: 1st Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL 2000), Seattle, vol. 1, pp. 9–17 (2000)
Carlson, L., Marcu, D.: Discourse Tagging Reference Manual. ISI Technical Report ISITR-545. University of Southern California, Los Angeles (2001)
da Cunha, I., Iruskieta, M.: La influencia del anotador y las técnicas de traducción en el desarrollo de árboles retóricos. Un estudio en español y euskera. In: 7th Brazilian Symposium in Information and Human Language Technology (STIL). Universidade de São Paulo, São Carlos (2009)
Alonso, L.: Representing discourse for automatic text summarization via shallow NLP techniques. PhD thesis. Universitat de Barcelona, Barcelona (2005)
Atserias, J., Casas, B., Comelles, E., González, M., Padró, L.l., Padró, M.: FreeLing 1.3: Syntactic and semantic services in an open-source NLP library. In: 5th International Conference on Language Resources and Evaluation. ELRA (2006)
Afantenos, S., Denis, P., Muller, P., Danlos, L.: Learning Recursive Segments for Discourse Parsing. In: Proceedings of the Seventh Conference on International Language Resources and Evaluation (2010)
da Cunha, I., Fernández, S., Velázquez-Morales, P., Vivaldi, J., SanJuan, E., Torres-Moreno, J.-M.: A New Hybrid Summarizer Based on Vector Space Model, Statistical Physics and Linguistics. In: Gelbukh, A., Kuri Morales, Á.F. (eds.) MICAI 2007. LNCS (LNAI), vol. 4827, pp. 872–882. Springer, Heidelberg (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
da Cunha, I., SanJuan, E., Torres-Moreno, JM., Lloberes, M., Castellón, I. (2010). Discourse Segmentation for Spanish Based on Shallow Parsing. In: Sidorov, G., Hernández Aguirre, A., Reyes García, C.A. (eds) Advances in Artificial Intelligence. MICAI 2010. Lecture Notes in Computer Science(), vol 6437. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16761-4_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-16761-4_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16760-7
Online ISBN: 978-3-642-16761-4
eBook Packages: Computer ScienceComputer Science (R0)