Abstract
In this paper, we describe methods for handling multilingual non-compositional constructions in the framework of GF. We specifically look at methods to detect and extract non-compositional phrases from parallel texts and propose methods to handle such constructions in GF grammars. We expect that the methods to handle non-compositional constructions will enrich CNLs by providing more flexibility in the design of controlled languages. We look at two specific use cases of non-compositional constructions: a general-purpose method to detect and extract multilingual multiword expressions and a procedure to identify nominal compounds in German. We evaluate our procedure for multiword expressions by performing a qualitative analysis of the results. For the experiments on nominal compounds, we incorporate the detected compounds in a full SMT pipeline and evaluate the impact of our method in machine translation process.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Angelov, K.: The Mechanics of the Grammatical Framework. PhD thesis, Chalmers University of Technology (2011)
Angelov, K., Enache, R.: Typeful Ontologies with Direct Multilingual Verbalization. In: Rosner, M., Fuchs, N.E. (eds.) CNL 2010. LNCS, vol. 7175, pp. 1–20. Springer, Heidelberg (2012)
Angelov, K., Ljunglöf, P.: fast statistical parsing with parallel multiple context-free grammars. In: European Chapter of the Association for Computational Linguistics, Gothenburg (2014)
Baldwin, T., Kim, S.N.: Multiword expressions. In: Handbook of Natural Language Processing, 2nd edn. (2010)
Bouamor, D., Semmar, N., Zweigenbaum, P.: Identifying bilingual multi-word expressions for statistical machine translation. In: Calzolari, N., Choukri, K., Declerck, T., Doan, M.U., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey, European Language Resources Association (ELRA) (May 2012)
Dannélls, D., Damova, M., Enache, R., Chechev, M.: A framework for improved access to museum databases in the semantic web. In: Recent Advances in Natural Language Processing (RANLP) (2011)
Dannélls, D., Enache, R., Damova, M., Chechev, M.: Multilingual online generation from semantic web ontologies. In: WWW 2012, EU projects track (2012)
Davis, B., Enache, R., van Grondelle, J., Pretorius, L.: Multilingual Verbalisation of Modular Ontologies using GF and Lemon. In: Kuhn, T., Fuchs, N.E. (eds.) CNL 2012. LNCS, vol. 7427, pp. 167–184. Springer, Heidelberg (2012)
Enache, R.: Frontiers of Multilingual Grammar Development. PhD thesis, University of Gothenburg (2013)
Enache, R., España-Bonet, C., Ranta, A., Mà rquez, L.: A hybrid system for patent translation. In: Proceedings of the 16th Annual Conference of the European Association for Machine Translation (EAMT 2012), Trento, Italy, pp. 269–276 (2012)
España-Bonet, C., Enache, R., Angelov, K., Virk, S., Galgóczy, E., Gonzà lez, M., Ranta, A., Mà rquez, L.: WP5 final report: Statistical and robust machine translation (D 5.3) (2013)
Grūzītis, N., Dannélls, D.: Extracting a bilingual semantic grammar from FrameNet-annotated corpora (2014)
Gruzitis, N., Paikens, P., Barzdins, G.: FrameNet Resource Grammar Library for GF. In: Kuhn, T., Fuchs, N.E. (eds.) CNL 2012. LNCS, vol. 7427, pp. 121–137. Springer, Heidelberg (2012)
Kaljurand, K., Alumäe, T.: Controlled natural language in speech recognition based user interfaces. In: Kuhn, T., Fuchs, N.E. (eds.) CNL 2012. LNCS, vol. 7427, pp. 79–94. Springer, Heidelberg (2012)
Kiela, D., Clark, S.: Detecting compositionality of multi-word expressions using nearest neighbours in vector space models. In: EMNLP, pp. 1427–1432. ACL (2013)
Klein, D., Manning, C.D.: Accurate Unlexicalized Parsing. In: Proceedings of ACL (2003)
Koehn, P.: Europarl: A Parallel Corpus for Statistical Machine Translation. In: Proceedings of the 10th Machine Translation Summit (2005)
Korkontzelos, I.: Unsupervised Learning of Multiword Expressions. PhD thesis, University of York (2010)
Ramisch, C., De Araujo, V., Villavicencio, A.: A broad evaluation of techniques for automatic acquisition of multiword expressions. In: Proceedings of ACL 2012 Student Research Workshop, ACL 2012, Stroudsburg, PA, USA, pp. 1–6. Association for Computational Linguistics (2012)
Ranta, A.: Grammatical Framework: Programming with Multilingual Grammars. CSLI Publications (2011)
Angelov, K., Ranta, A.: Implementing Controlled Languages in GF. In: Fuchs, N.E. (ed.) CNL 2009. LNCS, vol. 5972, pp. 82–101. Springer, Heidelberg (2010)
Ranta, A., Camilleri, J., Détrez, G., Enache, R., Hallgren, T.: Grammar tool manual and best practices (D 2.3) (2012)
Ranta, A., Enache, R., Détrez, G.: Controlled language for everyday use: The MOLTO phrasebook. In: Rosner, M., Fuchs, N.E. (eds.) CNL 2010. LNCS, vol. 7175, pp. 115–136. Springer, Heidelberg (2012)
Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword expressions: A pain in the neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002)
Saludes, J., Xambó, S., The, G.F.: Mathematics Library. In: Proceedings of First Workshop on CTP Components for Educational Software, THedu 2011 (2011)
Saludes, J., Xambó, S.: Proceedings of EACA 2012, TODO (2012)
Saludes, J., Xambó, S.: Multilingual Sage. Tbilisi Mathematical Journal (2012)
Tsvetkov, Y., Wintner, S.: Extraction of multi-word expressions from small parallel corpora. In: Huang, C.-R., Jurafsky, D. (eds.) COLING (Posters), pp. 1256–1264. Chinese Information Processing Society of China (2010)
Villada Moirón, B., Tiedemann, J.: Identifying idiomatic expressions using automatic word alignment. In: Proceedings of the EACL 2006 Workshop on Multiword Expressions (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Enache, R., Listenmaa, I., Kolachina, P. (2014). Handling Non-compositionality in Multilingual CNLs. In: Davis, B., Kaljurand, K., Kuhn, T. (eds) Controlled Natural Language. CNL 2014. Lecture Notes in Computer Science(), vol 8625. Springer, Cham. https://doi.org/10.1007/978-3-319-10223-8_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-10223-8_14
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10222-1
Online ISBN: 978-3-319-10223-8
eBook Packages: Computer ScienceComputer Science (R0)