Abstract
We describe an ongoing project of construction of the Tatar Wordnet. The Tatar Wordnet is being constructed on the base of three source resources, developed by us. The first source is TatThes, a bilingual Russian-Tatar Social-Political Thesaurus. TatThes, in turn, has been constructed by manual translation and extension of RuThes, a linguistic ontology for Russian. The second source is a Tatar translation of RuWordNet, a wordnet for Russian. This translation was carried out automatically on the base of a Russian-Tatar dictionary, and then was manually verified. The third source is a semantic classification of Tatar verbs, developed from scratch. We discuss the structure, methodology of compilation and the current state these source resources, and justify the choice of them as the initial resources for building the Tatar Wordnet. Our ultimate goal is to publish Tatar Wordnet on the Linguistic Linked Open Data cloud and integrate it to the Global WordNet Grid.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Fellbaum, C.: WordNet. In: Poli, R., et al. (eds.) Theory and Applications of Ontology: Computer Applications, pp. 231–243. Springer, Heidelberg (2010). https://doi.org/10.1007/978-90-481-8847-5_10
Vossen, P. (ed.): EuroWordNet: General Document. University of Amsterdam (2002). http://www.illc.uva.nl/EuroWordNet/docs.html
Galieva, A., Kirillovich, A., Loukachevich, N., and Nevzorova, O.: Towards a Tatar WordNet: a methodology of using tatar thesaurus. In: Elizarov, A., et al. (eds.) Selected Papers of the XXI International Conference on Data Analytics and Management in Data Intensive Domains (DAMDID/RCDL 2019). CEUR Workshop Proceedings, vol. 2523, pp. 316–324. CEUR-WS (2019)
Çetinoğlu, Ö., Bilgin, O., Oflazer, K.: Turkish Wordnet. In: Oflazer, K., Saraçlar, M. (eds.) Turkish Natural Language Processing. TANLP, pp. 317–336. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-90165-7_15
Bilgin, O., Çetinoğlu, Ö., Oflazer, K.: Building a Wordnet for Turkish. Romanian J. Inf. Sci. Technol. 7(1–2), 163–172 (2004)
Tufis, D., Cristea, D., Stamou, S.: BalkaNet: aims, methods, results and perspectives. A general overview. Romanian J. Inf. Sci. Technol. 7(1–2), 9–43 (2004)
Ehsani, R.: KeNet: a comprehensive Turkish wordnet and using it in text clustering. Ph.D. thesis. Işık University (2018)
Ehsani, R., Solak, E., Yildiz, O.T.: Constructing a WordNet for Turkish using manual and automatic annotation. ACM Trans. Asian Low-Resource Lang. Inf. Process. 17(3), Article No. 24 (2018). https://doi.org/10.1145/3185664
Bond, F., Foster, R.: Linking and extending an Open Multilingual WordNet. In: Schuetze, H., Fung, P., Poesio, M. (eds.) Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), Volume 1: Long Papers, pp. 1352–1362. ACL (2013)
Navigli, R., Ponzetto, S.P.: BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217–250 (2012). https://doi.org/10.1016/j.artint.2012.07.001
Loukachevitch, N., Dobrov, B.: RuThes linguistic ontology vs. Russian wordnets. In: Orav, H., Fellbaum, C., Vossen, P. (eds.) Proceedings of the 7th Conference on Global WordNet (GWC 2014), pp. 154–162. University of Tartu Press (2014)
Loukachevitch, N.V., Dobrov, B.V., Chetviorkin, I.I.: RuThes-Lite, a publicly available version of thesauru of Russian language RuThes. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue”, pp. 340–349. RGGU (2014)
Galieva, A., Kirillovich, A., Khakimov, B., Loukachevitch, N., Nevzorova, O., Suleymanov, D.: Toward domain-specific Russian-Tatar thesaurus construction. In: Bolgov, R., et al. (eds.) Proceedings of the International Conference on Internet and Modern Society (IMS-2017), pp. 120–124. ACM Press (2017). https://doi.org/10.1145/3143699.3143716
Galieva, A., Nevzorova, O., Yakubova, D.: Russian-Tatar socio-political thesaurus: methodology, challenges, the status of the project. In: Mitkov, R., Angelova, G. (eds.) Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP 2017), pp. 245–252. INCOMA Ltd. (2017). https://doi.org/10.26615/978-954-452-049-6_034
Kirillovich, A., Nevzorova, O., Gimadiev, E., Loukachevitch, N.: RuThes Cloud: towards a multilevel linguistic linked open data resource for Russian. In: Różewski, P., Lange, C. (eds.) KESW 2017. CCIS, vol. 786, pp. 38–52. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69548-8_4
Loukachevitch, N.V., Lashevich, G., Gerasimova, A.A., Ivanov, V.V., Dobrov, B.V.: Creating Russian WordNet by conversion. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual Conference “Dialogue”, pp. 405–415. RGGU (2016)
Loukachevitch, N., Lashevich, G., Dobrov, B.: Comparing two thesaurus representations for Russian. In: Bond, F., Kuribayashi, T., Fellbaum, C., Vossen, P. (eds.) Proceedings of the 9th Global WordNet Conference (GWC 2018), pp. 35–44. GWA (2018)
Galieva, A., Vavilova, Z., Gatiatullin, A.: Semantic classification of Tatar verbs: selecting relevant parameters. In: Čibej, J., Kosem, I., and Krek, S. (eds.) Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts (Euralex 2018), pp. 811–818. Ljubljana University Press (2018)
Cimiano, P., Chiarcos, C., McCrae, J.P., Gracia, J.: Linguistic Linked Open Data Cloud. In: Cimiano, P., et al. (eds.) Linguistic Linked Data: Representation, Generation and Applications, pp. 29–41. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-30225-2_3
Vossen, P., Bond, F., McCrae, J.P.: Toward a truly multilingual Global Wordnet Grid. In: Barbu Mititelu, V., et al. (eds.) Proceedings of the 8th Global WordNet Conference (GWC 2016), pp. 419–426. GWA (2016)
Acknowledgments
This work was funded by Russian Science Foundation according to the research project no. 19-71-10056.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Kirillovich, A., Galieva, A., Nevzorova, O., Shaekhov, M., Loukachevitch, N., Ilvovsky, D. (2020). Tatar WordNet: The Sources and the Component Parts. In: Elizarov, A., Novikov, B., Stupnikov, S. (eds) Data Analytics and Management in Data Intensive Domains. DAMDID/RCDL 2019. Communications in Computer and Information Science, vol 1223. Springer, Cham. https://doi.org/10.1007/978-3-030-51913-1_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-51913-1_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-51912-4
Online ISBN: 978-3-030-51913-1
eBook Packages: Computer ScienceComputer Science (R0)