Abstract
An increasing availability of parallel bilingual corpora and of automatic methods and tools makes it possible to build linguistic and terminological resources for low-resourced languages. We propose to exploit corpora available in several languages for building bilingual and trilingual terminologies. Typically, terminology information extracted in better resourced languages is associated with the corresponding units in lower-resourced languages thanks to the multilingual transfer. The method is applied on corpora involving Ukrainian language. According to the experiments, precision of term extraction varies between 0.454 and 0.966, while the quality of the interlingual relations varies between 0.309 and 0.965. The resource built contains 4,588 medical terms in Ukrainian and their 34,267 relations with French and English terms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alieva, V.: Onomatopoeic words in the Crimean Tatar language. Uchenye zapiski 18(57), 8–11 (2005)
Aubin, S., Hamon, T.: Improving term extraction with terminological resources. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds.) FinTAL 2006. LNCS (LNAI), vol. 4139, pp. 380–387. Springer, Heidelberg (2006). https://doi.org/10.1007/11816508_39
Babych, S., Eberle, K., Babych, B.: Development of hybrid machine translation systems for under-resourced languages: automated creation of lexical and morphological resources for MT. In: Applied and Literary Translation and Interpreting: Theory, Methodology, Practice. p. 5. Kyiv, Ukraine, April 2013
Brämer, G.: International statistical classification of diseases and related health problems. Tenth revision. World Health Stat. Q. 41(1), 32–6 (1988)
Cabré, M., Estopà, R., Vivaldi, J.: Automatic Term Detection: A Review of Current Systems, pp. 53–88. John Benjamins, Amsterdam (2001)
Delač, D., Krleža, Z., Šnajder, J., Dalbelo Bašić, B., Šarić, F.: TermeX: a tool for collocation extraction. In: Gelbukh, A. (ed.) CICLing 2009. LNCS, vol. 5449, pp. 149–157. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00382-0_12
Dmytruk, V.: Typological features of word-formation in computing, the internet and programming in the first decade of the XXI century. In: УДК, pp. 1–11 (2009)
Dyer, C., Chahuneau, V., Smith, N.A.: A simple, fast, and effective reparameterization of IBM model 2. In: NAACL/HLT, pp. 644–648 (2013)
Grigonyte, G., Rimkute, E., Utka, A., Boizou, L.: Experiments on Lithuanian term extraction. In: NODALIDA, vol. 2011, pp. 82–89 (2011)
Kageura, K., Umino, B.: Methods of automatic term recognition. In: National Center for Science Information Systems, pp. 1–22 (1996)
Kelih, E., Buk, S., Grzybek, P., Rovenchak, A.: Project description: designing and constructing a typologically balanced ukrainian text database. In: Методианалiзутексту, pp. 125–132 (2009)
Kotsyba, N., Mykulyak, A., Shevchenko, I.V.: UGTag: morphological analyzer and tagger for the Ukrainian language. In: Proceedings of the International Conference Practical Applications in Language and Computers, PALC 2009 (2009)
Kruglevskis, V., Vancane, I.: Term extraction from legal texts in Latvian. In: Second Baltic Conference on Human Language Technologies (2005)
Lindberg, D., Humphreys, B., McCray, A.: The unified medical language system. Methods Inf. Med. 32(4), 281–291 (1993)
Lopez, A., Nossal, M., Hwa, R., Resnik, P.: Word-level alignment for multilingual resource acquisition. In: LREC Workshop on Linguistic Knowledge Acquisition and Representation: Bootstrapping Annotated Data, Las Palmas, Spain (2002)
McDonald, R., Petrov, S., Hall, K.: Multi-source transfer of delexicalized dependency parsers. In: EMNLP (2011)
Memetova, E.: Lexicophraseological expressive means of the Crimean Tatar language. Uchenye zapiski 18(57), 37–39 (2007)
Namer, F.: FLEMM: un analyseur flexionnel du français à base de règles. Traitement automatique des langues (TAL) 41(2), 523–547 (2000)
National Library of Medicine, Bethesda, Maryland: Medical Subject Headings (2001). www.nlm.nih.gov/mesh/meshhome.html
Och, F., Ney, H.: Improved statistical alignment models. In: ACL, pp. 440–447 (2000)
Pazienza, M.T., Pennacchiotti, M., Zanzotto, F.: Terminology extraction: an analysis of linguistic and statistical approaches. In: Sirmakessis, S. (ed.) Knowledge Mining, Studies in Fuzziness and Soft Computing, vol. 185, pp. 255–279. Springer, Berlin Heidelberg (2005). https://doi.org/10.1007/3-540-32394-5_20
Pinnis, M., Ljubešić, N.,Ştefǎnescu, D., Skadiņa, I., Tadić, M., Gornostay, T.: Term extraction, tagging, and mapping tools for under-resourced languages. In: TKE 2012, pp. 193–208 (2012)
Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: International Conference on New Methods in Language Processing, pp. 44–49 (1994)
Shatalina, O.: Literature terminology of the old Ukrainian literature of the 18th century. Uchenye zapiski 18(57), 5–7 (2005)
Shyshkina, N., Zorko, G., Lesko, L.: Terminology work and software localization in Ukraine. In: Problems of Cybernetics and Informatics, pp. 17–20 (2010)
Tadić, M., Šojat, K.: Finding multiword term candidates in Croatian. In: IESL Workshop, RANLP Conference, pp. 102–107 (2003)
Tiedemann, J., Kotzé, G.: A discriminative approach to tree alignment. In: Ilisei, I., Pekar, V., Bernardini, S. (eds.) International Workshop on Natural Language Processing Methods and Corpora in Translation, Lexicography and Language Learning, pp. 33–39 (2009)
Vivaldi, J., Rodríguez, H.: Arabic medical term compilation from Wikipedia. In: Proceedings of the CIST 2014 (2014)
Yarowsky, D., Ngai, G., Wicentowski, R.: Inducing multilingual text analysis tools via robust projection across aligned corpora. In: HLT (2001)
Zeman, D., Resnik, P.: Cross-language parser adaptation between related languages. In: NLP for Less Privileged Languages (2008)
Bulgakov, O.: Building a semantic dictionary of prepositional constructions based on the Ukrainian national linguistic corpus. Technical report, Ukrainian language-information fund of NAS of Ukraine, Kiev, Ukraine (2006). (In Russian)
Glibovets, A., Reshetnev, I.: Iterative method for the construction of terminology from scientific corpora in Ukrainian. Cybern. Syst. Anal. 50(6), 53–62 (2014). (In Russian)
Demska, O.: Textual corpus: idea of another form. VPC NaUKMA, Kyiv (2011). (In Ukrainian)
Kossak, O.: Ukrainian computational terminology. In: Modern Problems of Computer Science, pp. 39–42 (2000). (In Ukrainian)
Kocherha, O., Meinarovych, E.: Scientific English-Ukrainian-English dictionary. Physics and close areas. Nova knyha, Vinnytsia (2010). (In Ukrainian)
Lalaieva, R., Surovanets, I., Tychtchenko, O.: Indexing of Polish, Russian and Ukrainian speech therapy terminology. Lexicographical J. 10, 29–36 (2004). (In Ukrainian)
Levchenko, O., Kulchytsky, I.: Technology for transforming a five-language comparative dictionary in digital format. In: Information Systems and Networks, pp. 129–138 (2013). (In Ukrainian)
Monakhova, T.: Exploitation of corpus linguistics methods in lexicography. Sci. Works 98(85), 55–60 (2009). (In Ukrainian)
Puriaeva, N.: Analysis of religious language in general and of the religious dictionary in particular. Lexicographical J. 10, 36–42 (2004). (In Ukrainian)
Tymenko, L.: Lexical and thematic clusters of Ukrainian law terminology at the beginning of the XX century. Lexicographical J. 10, 65–70 (2004). (In Ukrainian)
Acknowledgments
This work is funded by the LIMSI-CNRS AI project Outiller l’Ukranien.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Hamon, T., Grabar, N. (2018). Adaptation of Cross-Lingual Transfer Methods for the Building of Medical Terminology in Ukrainian. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9623. Springer, Cham. https://doi.org/10.1007/978-3-319-75477-2_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-75477-2_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75476-5
Online ISBN: 978-3-319-75477-2
eBook Packages: Computer ScienceComputer Science (R0)