Adaptation of Cross-Lingual Transfer Methods for the Building of Medical Terminology in Ukrainian

Hamon, Thierry; Grabar, Natalia

doi:10.1007/978-3-319-75477-2_15

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9623))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1334 Accesses

Abstract

An increasing availability of parallel bilingual corpora and of automatic methods and tools makes it possible to build linguistic and terminological resources for low-resourced languages. We propose to exploit corpora available in several languages for building bilingual and trilingual terminologies. Typically, terminology information extracted in better resourced languages is associated with the corresponding units in lower-resourced languages thanks to the multilingual transfer. The method is applied on corpora involving Ukrainian language. According to the experiments, precision of term extraction varies between 0.454 and 0.966, while the quality of the interlingual relations varies between 0.309 and 0.965. The resource built contains 4,588 medical terms in Ukrainian and their 34,267 relations with French and English terms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Alieva, V.: Onomatopoeic words in the Crimean Tatar language. Uchenye zapiski 18(57), 8–11 (2005)
Google Scholar
Aubin, S., Hamon, T.: Improving term extraction with terminological resources. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds.) FinTAL 2006. LNCS (LNAI), vol. 4139, pp. 380–387. Springer, Heidelberg (2006). https://doi.org/10.1007/11816508_39
Chapter Google Scholar
Babych, S., Eberle, K., Babych, B.: Development of hybrid machine translation systems for under-resourced languages: automated creation of lexical and morphological resources for MT. In: Applied and Literary Translation and Interpreting: Theory, Methodology, Practice. p. 5. Kyiv, Ukraine, April 2013
Google Scholar
Brämer, G.: International statistical classification of diseases and related health problems. Tenth revision. World Health Stat. Q. 41(1), 32–6 (1988)
Google Scholar
Cabré, M., Estopà, R., Vivaldi, J.: Automatic Term Detection: A Review of Current Systems, pp. 53–88. John Benjamins, Amsterdam (2001)
Google Scholar
Delač, D., Krleža, Z., Šnajder, J., Dalbelo Bašić, B., Šarić, F.: TermeX: a tool for collocation extraction. In: Gelbukh, A. (ed.) CICLing 2009. LNCS, vol. 5449, pp. 149–157. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00382-0_12
Chapter Google Scholar
Dmytruk, V.: Typological features of word-formation in computing, the internet and programming in the first decade of the XXI century. In: УДК, pp. 1–11 (2009)
Google Scholar
Dyer, C., Chahuneau, V., Smith, N.A.: A simple, fast, and effective reparameterization of IBM model 2. In: NAACL/HLT, pp. 644–648 (2013)
Google Scholar
Grigonyte, G., Rimkute, E., Utka, A., Boizou, L.: Experiments on Lithuanian term extraction. In: NODALIDA, vol. 2011, pp. 82–89 (2011)
Google Scholar
Kageura, K., Umino, B.: Methods of automatic term recognition. In: National Center for Science Information Systems, pp. 1–22 (1996)
Google Scholar
Kelih, E., Buk, S., Grzybek, P., Rovenchak, A.: Project description: designing and constructing a typologically balanced ukrainian text database. In: Методианалiзутексту, pp. 125–132 (2009)
Google Scholar
Kotsyba, N., Mykulyak, A., Shevchenko, I.V.: UGTag: morphological analyzer and tagger for the Ukrainian language. In: Proceedings of the International Conference Practical Applications in Language and Computers, PALC 2009 (2009)
Google Scholar
Kruglevskis, V., Vancane, I.: Term extraction from legal texts in Latvian. In: Second Baltic Conference on Human Language Technologies (2005)
Google Scholar
Lindberg, D., Humphreys, B., McCray, A.: The unified medical language system. Methods Inf. Med. 32(4), 281–291 (1993)
Article Google Scholar
Lopez, A., Nossal, M., Hwa, R., Resnik, P.: Word-level alignment for multilingual resource acquisition. In: LREC Workshop on Linguistic Knowledge Acquisition and Representation: Bootstrapping Annotated Data, Las Palmas, Spain (2002)
Google Scholar
McDonald, R., Petrov, S., Hall, K.: Multi-source transfer of delexicalized dependency parsers. In: EMNLP (2011)
Google Scholar
Memetova, E.: Lexicophraseological expressive means of the Crimean Tatar language. Uchenye zapiski 18(57), 37–39 (2007)
Google Scholar
Namer, F.: FLEMM: un analyseur flexionnel du français à base de règles. Traitement automatique des langues (TAL) 41(2), 523–547 (2000)
Google Scholar
National Library of Medicine, Bethesda, Maryland: Medical Subject Headings (2001). www.nlm.nih.gov/mesh/meshhome.html
Och, F., Ney, H.: Improved statistical alignment models. In: ACL, pp. 440–447 (2000)
Google Scholar
Pazienza, M.T., Pennacchiotti, M., Zanzotto, F.: Terminology extraction: an analysis of linguistic and statistical approaches. In: Sirmakessis, S. (ed.) Knowledge Mining, Studies in Fuzziness and Soft Computing, vol. 185, pp. 255–279. Springer, Berlin Heidelberg (2005). https://doi.org/10.1007/3-540-32394-5_20
Google Scholar
Pinnis, M., Ljubešić, N.,Ştefǎnescu, D., Skadiņa, I., Tadić, M., Gornostay, T.: Term extraction, tagging, and mapping tools for under-resourced languages. In: TKE 2012, pp. 193–208 (2012)
Google Scholar
Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: International Conference on New Methods in Language Processing, pp. 44–49 (1994)
Google Scholar
Shatalina, O.: Literature terminology of the old Ukrainian literature of the 18th century. Uchenye zapiski 18(57), 5–7 (2005)
Google Scholar
Shyshkina, N., Zorko, G., Lesko, L.: Terminology work and software localization in Ukraine. In: Problems of Cybernetics and Informatics, pp. 17–20 (2010)
Google Scholar
Tadić, M., Šojat, K.: Finding multiword term candidates in Croatian. In: IESL Workshop, RANLP Conference, pp. 102–107 (2003)
Google Scholar
Tiedemann, J., Kotzé, G.: A discriminative approach to tree alignment. In: Ilisei, I., Pekar, V., Bernardini, S. (eds.) International Workshop on Natural Language Processing Methods and Corpora in Translation, Lexicography and Language Learning, pp. 33–39 (2009)
Google Scholar
Vivaldi, J., Rodríguez, H.: Arabic medical term compilation from Wikipedia. In: Proceedings of the CIST 2014 (2014)
Google Scholar
Yarowsky, D., Ngai, G., Wicentowski, R.: Inducing multilingual text analysis tools via robust projection across aligned corpora. In: HLT (2001)
Google Scholar
Zeman, D., Resnik, P.: Cross-language parser adaptation between related languages. In: NLP for Less Privileged Languages (2008)
Google Scholar
Bulgakov, O.: Building a semantic dictionary of prepositional constructions based on the Ukrainian national linguistic corpus. Technical report, Ukrainian language-information fund of NAS of Ukraine, Kiev, Ukraine (2006). (In Russian)
Google Scholar
Glibovets, A., Reshetnev, I.: Iterative method for the construction of terminology from scientific corpora in Ukrainian. Cybern. Syst. Anal. 50(6), 53–62 (2014). (In Russian)
Google Scholar
Demska, O.: Textual corpus: idea of another form. VPC NaUKMA, Kyiv (2011). (In Ukrainian)
Google Scholar
Kossak, O.: Ukrainian computational terminology. In: Modern Problems of Computer Science, pp. 39–42 (2000). (In Ukrainian)
Google Scholar
Kocherha, O., Meinarovych, E.: Scientific English-Ukrainian-English dictionary. Physics and close areas. Nova knyha, Vinnytsia (2010). (In Ukrainian)
Google Scholar
Lalaieva, R., Surovanets, I., Tychtchenko, O.: Indexing of Polish, Russian and Ukrainian speech therapy terminology. Lexicographical J. 10, 29–36 (2004). (In Ukrainian)
Google Scholar
Levchenko, O., Kulchytsky, I.: Technology for transforming a five-language comparative dictionary in digital format. In: Information Systems and Networks, pp. 129–138 (2013). (In Ukrainian)
Google Scholar
Monakhova, T.: Exploitation of corpus linguistics methods in lexicography. Sci. Works 98(85), 55–60 (2009). (In Ukrainian)
Google Scholar
Puriaeva, N.: Analysis of religious language in general and of the religious dictionary in particular. Lexicographical J. 10, 36–42 (2004). (In Ukrainian)
Google Scholar
Tymenko, L.: Lexical and thematic clusters of Ukrainian law terminology at the beginning of the XX century. Lexicographical J. 10, 65–70 (2004). (In Ukrainian)
Google Scholar

Download references

Acknowledgments

This work is funded by the LIMSI-CNRS AI project Outiller l’Ukranien.

Author information

Authors and Affiliations

LIMSI, CNRS, Université Paris-Saclay, 91405, Orsay, France
Thierry Hamon
Université Paris 13, Sorbonne Paris Cité, 93430, Villetaneuse, France
Thierry Hamon
CNRS, UMR 8163, 59000, Lille, France
Natalia Grabar
Univ. Lille, UMR 8163 - STL - Savoirs Textes Langage, 59000, Lille, France
Natalia Grabar

Authors

Thierry Hamon
View author publications
You can also search for this author in PubMed Google Scholar
Natalia Grabar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thierry Hamon .

Editor information

Editors and Affiliations

CIC, Instituto Politécnico Nacional, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hamon, T., Grabar, N. (2018). Adaptation of Cross-Lingual Transfer Methods for the Building of Medical Terminology in Ukrainian. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9623. Springer, Cham. https://doi.org/10.1007/978-3-319-75477-2_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-75477-2_15
Published: 21 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75476-5
Online ISBN: 978-3-319-75477-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics