Skip to main content

Adaptation of Cross-Lingual Transfer Methods for the Building of Medical Terminology in Ukrainian

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2016)

Abstract

An increasing availability of parallel bilingual corpora and of automatic methods and tools makes it possible to build linguistic and terminological resources for low-resourced languages. We propose to exploit corpora available in several languages for building bilingual and trilingual terminologies. Typically, terminology information extracted in better resourced languages is associated with the corresponding units in lower-resourced languages thanks to the multilingual transfer. The method is applied on corpora involving Ukrainian language. According to the experiments, precision of term extraction varies between 0.454 and 0.966, while the quality of the interlingual relations varies between 0.309 and 0.965. The resource built contains 4,588 medical terms in Ukrainian and their 34,267 relations with French and English terms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.nlm.nih.gov/medlineplus/healthtopics.html.

  2. 2.

    https://github.com/fluxbb/langs/blob/master/Ukrainian/stopwords.txt.

  3. 3.

    http://search.cpan.org/~szabgab/Text-MediawikiFormat.

References

  1. Alieva, V.: Onomatopoeic words in the Crimean Tatar language. Uchenye zapiski 18(57), 8–11 (2005)

    Google Scholar 

  2. Aubin, S., Hamon, T.: Improving term extraction with terminological resources. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds.) FinTAL 2006. LNCS (LNAI), vol. 4139, pp. 380–387. Springer, Heidelberg (2006). https://doi.org/10.1007/11816508_39

    Chapter  Google Scholar 

  3. Babych, S., Eberle, K., Babych, B.: Development of hybrid machine translation systems for under-resourced languages: automated creation of lexical and morphological resources for MT. In: Applied and Literary Translation and Interpreting: Theory, Methodology, Practice. p. 5. Kyiv, Ukraine, April 2013

    Google Scholar 

  4. Brämer, G.: International statistical classification of diseases and related health problems. Tenth revision. World Health Stat. Q. 41(1), 32–6 (1988)

    Google Scholar 

  5. Cabré, M., Estopà, R., Vivaldi, J.: Automatic Term Detection: A Review of Current Systems, pp. 53–88. John Benjamins, Amsterdam (2001)

    Google Scholar 

  6. Delač, D., Krleža, Z., Šnajder, J., Dalbelo Bašić, B., Šarić, F.: TermeX: a tool for collocation extraction. In: Gelbukh, A. (ed.) CICLing 2009. LNCS, vol. 5449, pp. 149–157. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00382-0_12

    Chapter  Google Scholar 

  7. Dmytruk, V.: Typological features of word-formation in computing, the internet and programming in the first decade of the XXI century. In: УДК, pp. 1–11 (2009)

    Google Scholar 

  8. Dyer, C., Chahuneau, V., Smith, N.A.: A simple, fast, and effective reparameterization of IBM model 2. In: NAACL/HLT, pp. 644–648 (2013)

    Google Scholar 

  9. Grigonyte, G., Rimkute, E., Utka, A., Boizou, L.: Experiments on Lithuanian term extraction. In: NODALIDA, vol. 2011, pp. 82–89 (2011)

    Google Scholar 

  10. Kageura, K., Umino, B.: Methods of automatic term recognition. In: National Center for Science Information Systems, pp. 1–22 (1996)

    Google Scholar 

  11. Kelih, E., Buk, S., Grzybek, P., Rovenchak, A.: Project description: designing and constructing a typologically balanced ukrainian text database. In: Методианалiзутексту, pp. 125–132 (2009)

    Google Scholar 

  12. Kotsyba, N., Mykulyak, A., Shevchenko, I.V.: UGTag: morphological analyzer and tagger for the Ukrainian language. In: Proceedings of the International Conference Practical Applications in Language and Computers, PALC 2009 (2009)

    Google Scholar 

  13. Kruglevskis, V., Vancane, I.: Term extraction from legal texts in Latvian. In: Second Baltic Conference on Human Language Technologies (2005)

    Google Scholar 

  14. Lindberg, D., Humphreys, B., McCray, A.: The unified medical language system. Methods Inf. Med. 32(4), 281–291 (1993)

    Article  Google Scholar 

  15. Lopez, A., Nossal, M., Hwa, R., Resnik, P.: Word-level alignment for multilingual resource acquisition. In: LREC Workshop on Linguistic Knowledge Acquisition and Representation: Bootstrapping Annotated Data, Las Palmas, Spain (2002)

    Google Scholar 

  16. McDonald, R., Petrov, S., Hall, K.: Multi-source transfer of delexicalized dependency parsers. In: EMNLP (2011)

    Google Scholar 

  17. Memetova, E.: Lexicophraseological expressive means of the Crimean Tatar language. Uchenye zapiski 18(57), 37–39 (2007)

    Google Scholar 

  18. Namer, F.: FLEMM: un analyseur flexionnel du français à base de règles. Traitement automatique des langues (TAL) 41(2), 523–547 (2000)

    Google Scholar 

  19. National Library of Medicine, Bethesda, Maryland: Medical Subject Headings (2001). www.nlm.nih.gov/mesh/meshhome.html

  20. Och, F., Ney, H.: Improved statistical alignment models. In: ACL, pp. 440–447 (2000)

    Google Scholar 

  21. Pazienza, M.T., Pennacchiotti, M., Zanzotto, F.: Terminology extraction: an analysis of linguistic and statistical approaches. In: Sirmakessis, S. (ed.) Knowledge Mining, Studies in Fuzziness and Soft Computing, vol. 185, pp. 255–279. Springer, Berlin Heidelberg (2005). https://doi.org/10.1007/3-540-32394-5_20

    Google Scholar 

  22. Pinnis, M., Ljubešić, N.,Ştefǎnescu, D., Skadiņa, I., Tadić, M., Gornostay, T.: Term extraction, tagging, and mapping tools for under-resourced languages. In: TKE 2012, pp. 193–208 (2012)

    Google Scholar 

  23. Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: International Conference on New Methods in Language Processing, pp. 44–49 (1994)

    Google Scholar 

  24. Shatalina, O.: Literature terminology of the old Ukrainian literature of the 18th century. Uchenye zapiski 18(57), 5–7 (2005)

    Google Scholar 

  25. Shyshkina, N., Zorko, G., Lesko, L.: Terminology work and software localization in Ukraine. In: Problems of Cybernetics and Informatics, pp. 17–20 (2010)

    Google Scholar 

  26. Tadić, M., Šojat, K.: Finding multiword term candidates in Croatian. In: IESL Workshop, RANLP Conference, pp. 102–107 (2003)

    Google Scholar 

  27. Tiedemann, J., Kotzé, G.: A discriminative approach to tree alignment. In: Ilisei, I., Pekar, V., Bernardini, S. (eds.) International Workshop on Natural Language Processing Methods and Corpora in Translation, Lexicography and Language Learning, pp. 33–39 (2009)

    Google Scholar 

  28. Vivaldi, J., Rodríguez, H.: Arabic medical term compilation from Wikipedia. In: Proceedings of the CIST 2014 (2014)

    Google Scholar 

  29. Yarowsky, D., Ngai, G., Wicentowski, R.: Inducing multilingual text analysis tools via robust projection across aligned corpora. In: HLT (2001)

    Google Scholar 

  30. Zeman, D., Resnik, P.: Cross-language parser adaptation between related languages. In: NLP for Less Privileged Languages (2008)

    Google Scholar 

  31. Bulgakov, O.: Building a semantic dictionary of prepositional constructions based on the Ukrainian national linguistic corpus. Technical report, Ukrainian language-information fund of NAS of Ukraine, Kiev, Ukraine (2006). (In Russian)

    Google Scholar 

  32. Glibovets, A., Reshetnev, I.: Iterative method for the construction of terminology from scientific corpora in Ukrainian. Cybern. Syst. Anal. 50(6), 53–62 (2014). (In Russian)

    Google Scholar 

  33. Demska, O.: Textual corpus: idea of another form. VPC NaUKMA, Kyiv (2011). (In Ukrainian)

    Google Scholar 

  34. Kossak, O.: Ukrainian computational terminology. In: Modern Problems of Computer Science, pp. 39–42 (2000). (In Ukrainian)

    Google Scholar 

  35. Kocherha, O., Meinarovych, E.: Scientific English-Ukrainian-English dictionary. Physics and close areas. Nova knyha, Vinnytsia (2010). (In Ukrainian)

    Google Scholar 

  36. Lalaieva, R., Surovanets, I., Tychtchenko, O.: Indexing of Polish, Russian and Ukrainian speech therapy terminology. Lexicographical J. 10, 29–36 (2004). (In Ukrainian)

    Google Scholar 

  37. Levchenko, O., Kulchytsky, I.: Technology for transforming a five-language comparative dictionary in digital format. In: Information Systems and Networks, pp. 129–138 (2013). (In Ukrainian)

    Google Scholar 

  38. Monakhova, T.: Exploitation of corpus linguistics methods in lexicography. Sci. Works 98(85), 55–60 (2009). (In Ukrainian)

    Google Scholar 

  39. Puriaeva, N.: Analysis of religious language in general and of the religious dictionary in particular. Lexicographical J. 10, 36–42 (2004). (In Ukrainian)

    Google Scholar 

  40. Tymenko, L.: Lexical and thematic clusters of Ukrainian law terminology at the beginning of the XX century. Lexicographical J. 10, 65–70 (2004). (In Ukrainian)

    Google Scholar 

Download references

Acknowledgments

This work is funded by the LIMSI-CNRS AI project Outiller l’Ukranien.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thierry Hamon .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hamon, T., Grabar, N. (2018). Adaptation of Cross-Lingual Transfer Methods for the Building of Medical Terminology in Ukrainian. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9623. Springer, Cham. https://doi.org/10.1007/978-3-319-75477-2_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-75477-2_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-75476-5

  • Online ISBN: 978-3-319-75477-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics