Skip to main content
Log in

Coreferential Relations in Basque: The Annotation Process

  • Published:
Journal of Psycholinguistic Research Aims and scope Submit manuscript

Abstract

In this paper we present the coreferential tagging of part of the EPEC Corpus of Basque. Although coreference is a pragmatic linguistic phenomenon highly dependent on the situational context, it shows some language-specific patterns that vary according to the features of each language. Due to the fact that Basque is not an Indo-European language, it differs considerably in grammar from the languages spoken in surrounding areas. We will explain these features and the decisions made in each case. After describing the criteria defined for coreferential tagging in Basque, the annotation process will be explained. Our annotation is based on a morphologically and syntactically annotated corpus that provides us with a manageable environment, in which the specific structures that are part of a reference chain can be more easily identified. A part of the corpus was tagged by two annotators who marked up the same text independently, and by another annotator that acted as judge, solving problems in case of disagreement. All this process has been automatized as a result of previous studies carried out in this field. The automatic detection of mentions (Soraluze et al., in: Proceedings of Konvens, 2012) has provided us with a better working environment, and given us the possibility to build a first significant corpus for a later computational treatment of automatic coreferential resolution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

Notes

  1. Most of the examples of this paper come from the EPEC corpus explained in Sect. 4.

  2. In generall, the examples in English may have more linguistic expressions that refer to the same entity but we only mark the equivalents of the elements annotated in Basque. See Sect. 3.1 for a detailed explanation.

  3. As mentioned before, in Basque the pronouns are formed by demonstratives (Laka 1996)

References

  • Aduriz, I., Ceberio, K., & Díaz de Ilarraza, A. (2005). Euskarazko anafora pronominala: Ikuspuntu konputazionala eta corpus baten garapena. Gogoa, 5(1), 91–116.

    Google Scholar 

  • Aduriz, I., Aranzabe, M. J., Arriola, J. M., Atutxa, A., Díaz de Ilarraza, A., Ezeiza, N., et al. (2006a). Methodology and steps towards the construction of EPEC, a corpus of written Basque tagged at morphological and syntactic levels for the automatic processing. In A. Wilson, P. Rayson, & D. Archer (Eds.), Corpus linguistics around the world. Book series: Language and computers (Vol. 56, pp. 1–15). Netherlands: Rodopi.

  • Aduriz, I., Aranzabe, M. J., Arriola, J. M., & de Ilarraza, A. D. (2006b). Sintaxi partziala. In B. Fernández & I. Laka (Eds.), Andolin gogoan: Essays in honour of professor Eguzkitza (pp. 31–49). Bilbo: UPV/EHU Publishing Services.

    Google Scholar 

  • Alegria, I., Artola, X., Sarasola, K., & Urkia, M. (1996). Automatic morphological analysis of Basque. Literary & Linguistic Computing, 11(4), 193–203.

    Article  Google Scholar 

  • Alegria, I., Ezeiza, N., & Fernandez, I. (2006). Named entities translation based on comparable corpora multi-word-expressions in a multilingual context. In Proceedings of workshop on EACL06 (pp. 1–8). Trento (Italy).

  • Arriola, J. M., Aduriz, I., Aldezabal, I., Aranzabe, M. J., Ceberio, K., Estarrona, A., Iruskieta, M., Lersundi, M., Pociello, E., Uria, L., & Urizar, R. (2013). Reusing the CG-2 grammar for processing basque complex postpositions. In A. D. Iñaki Alegria & J. Villena (Eds.). Actas del XXIX Congreso de la Sociedad Española del Procesamiento del Lenguaje Natural (SEPLN 2013) (pp. 20–27). Madrid (España).

  • Borthen, K. (2004). Predicative NPs and the annotation of reference chains. In Proceedings of Coling2004 (pp. 1175–1178). Geneva, Switzerland.

  • Botley, S., & McEnery, T. (Eds.). (2000). Corpus-based and computational approaches to discourse anaphora. Amsterdam: John Benjamins.

    Google Scholar 

  • Ceberio, K., Aduriz, I., de Ilarraza, A. D., & Garcia-Azkoaga, I. (2008). Erreferentziakidetasunaren azterketa eta anotazioa euskarazko corpus batean. In X. Artiagoitia & J. A. Lakarra (Eds.), Gramatika Jaietan. Patxi Goenagaren omenez, ASJU (Vol. 51, pp. 153–172). Bilbo: UPV/EHU & Gipuzkoako Foru Aldundia.

  • Cornish, F. (1999). Anaphora, discourse and understanding: Evidence from English and French. Oxford: Clarendon.

    Google Scholar 

  • Doddington, G., Mitchell, A., Przybocki, M., Ramshaw, L., Strassel, S., & Weischedel, R. (2004). The automatic content extraction (ACE) program—Tasks, data, and evaluation. In Proceedings of LREC 2004 (pp. 837–840), Lisbon.

  • Euskaltzaindia. (1985). Euskal Gramatika: Lehen urratsak-I. Bilbo: Euskaltzaindia.

  • Euskaltzaindia. (2002). Euskal Gramatika Laburra: Perpaus Bakuna. Bilbo: Euskaltzaindia (2nd ed.).

  • Garcia-Azkoaga, I. M. (2003). Kohesio anaforikoa hiru testu generotan. Adinaren araberako azterketa. Bilbo: EHU-UPV Publishing Services.

    Google Scholar 

  • Hualde, J. I., & Ortiz de Urbina, J. (Eds.). (2003). A grammar of basque. Berlin, New York: Mouton de Gruyter.

    Google Scholar 

  • Kleiber, G. (1994). Anaphores et pronoms. Louvain-la-Neuve: Duculot.

    Google Scholar 

  • Laka, I. (1996). A brief grammar of euskara, the basque language. EHU/UPV Publishing Services: Leioa (Spain). Retrieved December, 2016, from http://www.ehu.eus/eu/web/eins/a-brief-grammar-of-euskara.

  • McCarthy, J. F., & Lehnert, W. G. (1995). Using decision trees for conference resolution. In Proceedings of the 14th international joint conference on Artificial intelligence (Vol. 2, pp. 1050–1055). San Francisco, CA, USA.

  • Mitkov, R. (2002). Anaphora resolution. London: Longman.

    Google Scholar 

  • Moirand, S. (1990). Une grammaire des textes et des dialogues. Paris: Hachette.

    Google Scholar 

  • Müller, C., & Strube, M. (2006). Multi-level annotation of linguistic data with MMAX2. In S. Braun, K. Kohn, & J. Mukherjee (Eds.), Corpus technology and language pedagogy. New resources, new tools, new methods (English Corpus Linguistics, Vol. 3, pp. 197–214). Frankfurt: Peter Lang.

  • Nicolov, N., Salvetti, F., & Ivanova, S. (2008). Sentiment analysis: Does coreference matter?. In AISB 2008 convention communication, interaction and social intelligence, pp. 37–40.

  • Nilsson Björkenstam, K. (2013). SUC-CORE: A balanced corpus annotated with noun phrase coreference. Northern European Journal of Language Technology (NEJLT), 3, 19–39.

    Article  Google Scholar 

  • Ortiz de Urbina, J. (1989). Parameters in the grammar of basque: A GB approach to basque syntax. Dordrecht: Foris.

    Google Scholar 

  • Peral, J., Palomar, M., & Ferrández, A. (1999). Coreference-oriented interlingual slot structure & machine translation. In Proceedings of the workshop on coreference and its applications, (CorefApp 1999) (pp. 69–76). Stroudsburg, PA, USA.

  • Poon, H., Christensen, J., Domingos, P., Etzioni, O., Hoffmann, R., Kiddon, C., Lin, T., Ling, X., Ritter, A., Schoenmackers, S., Soderland, S., Weld, D., Wu, F., & Zhang, C. (2010). Machine reading at the University of Washington. In Proceedings of the NAACL HLT 2010 first international workshop on formalisms and methodology for learning by reading, (FAM-LbR 2010) (pp 87–95). Stroudsburg, PA, USA.

  • Pradhan, S. S., Ramshaw, L., Weischedel, R., MacBride, J., & Micciulla, L. (2007). Unrestricted coreference: Identifying entities and events in OntoNotes. In Proceedings of ICSC 2007 (pp. 446–453). Irvine, California.

  • Recasens, M. (2010). Coreference: Theory, annotation, resolution and evaluation. Ph.D. thesis, University of Barcelona, Spain.

  • Rodriguez, K. (2010). Resources for linguistically motivated multilingual anaphora resolution. Ph.D. thesis, University of Trento, Italy.

  • Saeed, J. I. (2009). Semantics (3rd ed.). New York: Wiley.

    Google Scholar 

  • Stede, M. (2011). Discourse processing. San Rafael, California: Morgan & Claypool Publishers.

    Google Scholar 

  • Steinberger, J., Poesio, M., Kabadjov, M. A., & Jeek, K. (2007). Two uses of anaphora resolution in summarization. Information Processing and Management, 43(6), 1663–1680.

    Article  Google Scholar 

  • Soraluze, A., Arregi, O., Arregi, X., Ceberio, K., & Díaz de Ilarraza, A. (2012). Mention detection: First steps in the development of a basque coreference resolution system. Proceedings of Konvens, 2012, 128–136.

    Google Scholar 

  • Stoyanov, V., Gilbert, N., Cardie, C., & Riloff, E. (2009). Conundrums in noun phrase coreference resolution: Making sense of the state of-the-art. In Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP (pp. 656–664). Suntec, Singapore.

  • Vicedo, J. L., & Ferrández, A. (2006). Coreference in Q&A. In Advances in open domain question answering (of text, speech and language technology) (Vol. 32, pp. 71–96). Berlin/New York: Springer.

  • Zabala, I. (1996). Testu-lotura: Lotura tematikoa eta erreferentzia-sareak testu teknikoetan. In Igone Zabala (Ed.), Testu-loturarako baliabideak: Euskara Teknikoa (pp. 15–44). Bilbo: EHU-UPV Publishing Services.

    Google Scholar 

  • Zabala, I., & Odriozola, J. C. (2004). Los complejos posposicionales en vasco. In G. E. Perez, Zabala I. Igone, & L. Gràcia Sole (Eds.), Las Fronteras de la Composición (pp. 281–315). Donostia: University of Deusto.

    Google Scholar 

  • Zhekova, D., & Kübler, S. (2010). UBIU: A language-independent system for coreference resolution. In Proceedings of the 5th international workshop on semantic evaluation (SemEval 2010) (pp. 96–99). Stroudsburg, PA, USA.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Klara Ceberio.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ceberio, K., Aduriz, I., Díaz de Ilarraza, A. et al. Coreferential Relations in Basque: The Annotation Process. J Psycholinguist Res 47, 325–342 (2018). https://doi.org/10.1007/s10936-018-9559-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10936-018-9559-6

Keywords

Navigation