Skip to main content
Log in

The Tomsk Dialect Corpus: a comprehensively annotated database of a Siberian Russian dialect from material collected over the last 70 years

  • Published:
Russian Linguistics Aims and scope Submit manuscript

Abstract

The paper offers the first full description of the Tomsk Dialect Corpus – an electronic resource based on recordings of the Russian dialect speech of the Tomsk and Kemerovo regions (West Siberia), which has been collected since 1946. The corpus counts 3,350,272 tokens, which makes it the largest electronic collection of dialect speech in Russia. The originality of this resource consists in the uniqueness of the materials collected and their multifaceted annotation. Topic and pragmatic annotations were created manually. Topic annotation is available for the whole data, whereas pragmatic annotation is available for 45,445 speech acts. Grammatical annotation was performed automatically with the PhpMorphy parser, with additional manual correction for some dialect words. Metalinguistic annotation includes the recording’s year and place, and the speakers’ age, gender, and educational level. All annotated parameters are searchable. The corpus also includes a lexicographic component, i.e. definitions of dialect lexemes.

Аннотация

В статье дается первое полное описание Томского диалектного корпуса – электронного ресурса на основе записей русской диалектной речи Томской и Кемеровской областей (Западная Сибирь), которые собирались с 1946 г. Корпус насчитывает 3 350 272 словоупотреблений и является крупнейшей электронной коллекцией диалектной речи в России. Оригинальность данного ресурса заключается в уникальности собранных материалов и их разносторонней разметке. Тематическая и прагматическая разметка были сделаны вручную. Тематическая разметка доступна для всего объёма материала, в рамках прагматической разметки выделено 45 445 речевых актов. Морфологическая аннотация сделана автоматически с помощью парсера PhpMorphy, дополнительно была выполнена ручная коррекция для некоторых диалектных слов. Металингвистическая аннотация включает год и место записи, возраст, пол и уровень образования говорящих. Все аннотированные параметры доступны для поиска. В состав корпуса также входит лексикографический компонент – толкования диалектных лексем.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data Availability

Zemicheva, S. S., Dubtsova, L. A., Gromov, M. L., Galanina, V. V., Ugryumova, M. M., Vasilchenko, A. A., Parshina, A. V., Popova, D. P., Duminskaya, A. V., Zyuzkova, N. A., & Bukhanova, E. D. Tomsk Dialect Corpus 2.0. Laboratory of General and Siberian Lexicography of the National Research Tomsk State University. Retrieved January 10, 2023, from http://losl.tsu.ru/?q=losl_search. Access mode: for registered users.

Notes

  1. Daniel et al. (2013–2018), Garder et al. (2018), Ter-Avanesova et al. (2018, 2019), Ron’ko et al. (2019), Ter-Avanesova et al. (2020), Ryko and Spiricheva (2020), Kuvshinskaya (2020), Kuvshinskaya and Mashkovtseva (2021), Malysheva and Ter-Avanesova (2021); Knyazev (2021, 2022a, 2022b, 2022c, 2023); Panova (2021), Flyagina et al. (2022–2023), Ron’ko et al. (2022).

  2. Note that some of these features are typical of Standard Russian too. However, they are not typical of other Russian dialects, so they can be considered as distinctive features of the Middle Ob region dialect.

  3. During all stages of fieldwork the speakers were asked for their oral consent to be recorded (written consent is not required according to the Russian law).

  4. This term refers to non-standard spelling, which is inconsistently used in the manuscripts. In some cases, only dialect phonetic features are reflected, but sometimes some standard features (e.g. akanje) are included too.

  5. Here and below tables include unknown data, i.e. data for which the relevant information is missing.

  6. Here and below by “text” we mean the transcript of one recording of a conversation with one or more speakers.

  7. It should be noted that topics might overlap in one text, so some tokens were counted more than once, which is why the total amount of the data annotated by topic is more than 3.3 million.

References

  • Bankova, T. B. (2018). Slovar’ Sibirskogo svadebnogo obrjada [Dictionary of the Siberian wedding ceremony]. Tomsk: Tomsk State University Publishing House.

    Google Scholar 

  • Baranov, V. A., Vernyaeva, R. A., & Zhdanova, E. A. (2020). Mul’timedijnyj korpus Russkix govorov Udmurtii: razrabotka i vozmozhnosti ispol’zovanija [The multimedia corpus of Russian dialects of Udmurtia: development and possible use]. Cuadernos de Rusistica Espanola, 16, 39–54. https://doi.org/10.30827/cre.v16i0.11763.

    Article  Google Scholar 

  • Blinova, O. I. (Ed.) (1998–2002). Vershininskij slovar’ [Vershininsky dictionary]. Vols 1–7. Tomsk: Tomsk State University Publishing House.

  • Blinova, O. I. & Palagina, V. V. (Eds.) (1975). Slovar’ Russkix starozhil’cheskix govorov srednej chasti bassejna r. Obi. Dopolnenie [Dictionary of Russian dialects of long-term residents of the middle part of the Ob river basin. Supplement]. Vols 1–2. Tomsk: Tomsk State University Publishing House.

  • Bogdanova-Beglarian, N. V., Blinova, O. V., Sherstinova, T. Yu., Troshchenkova, E. V., Gorbunova, D., & Zaides, K. D. (2019). Pragmatic markers of Russian everyday speech: the revised typology and corpus-based study. In Proceedings of the 25th Conference of Open Innovations Association FRUCT (pp. 57–63). Los Alamitos: IEEE Comput. Soc. https://doi.org/10.23919/FRUCT48121.2019.8981530.

    Chapter  Google Scholar 

  • Clua, E., & Lloret, M.-R. (2006). New tendencies in geographical dialectology: the Catalan Corpus oral dialectal (COD). New perspectives on Romance linguistics, 2, 31–47.

    Article  Google Scholar 

  • Daniel, M., Dobrushina, N., & von Waldenfels, R. (2013–2018). The language of the Ustja river basin. A corpus of North Russian dialectal speech. Bern, Moscow. www.parasolcorpus.org/Pushkino.

  • Dobrushina, N., & Sokur, E. (2022). Spoken corpora of Slavic languages. Russian Linguistics, 46, 77–93. https://doi.org/10.1007/s11185-022-09254-9.

    Article  Google Scholar 

  • Felde, O. V., Vasil’ev, V. K., Belogur, O. V., Speranskaja, A. N., Smirnov, E. S., Kajzer, K. V., Novikova, D. S., & Semenec, O. V. (Eds.) (2017–2020). Ėlektronnyj tekstovyj korpus lingvokul’tury Severnogo Priangar’ja [Text corpus of the linguistic culture of the northern Angara region]. http://angara.sfu-kras.ru/.

  • Flyagina, M. V., Kalinicheva, N. V., & Severina, E. M. (2022–2023). Corpus of the Russian dialect spoken in the villages of the Don River. Moscow: Linguistic Convergence Laboratory, HSE University. http://lingconlab.ru/don_rnd.

  • Garder, M., Petrova, N., Moroz, A., Panova, A., & Dobrushina, N. (2018). Korpus govora sela Spiridonova Buda. Moscow: Linguistic Convergence Laboratory, NRU HSE. http://lingconlab.ru/SpiridonovaBuda/.

    Google Scholar 

  • Gol’din, V. E. & Kryuchkova, Yu. O. (2011). Korpus russkoj dialektnoj rechi: koncepcija i parametry ocenki [A corpus of Russian dialect speech: the concept and parameters of evaluation]. In A. E. Kibrik, V. I. Belikov, I. M. Boguslavskij, B. V. Dobrov, & D. O. Dobrovol’skij (Eds.), Кomp’juternaja lingvistika i intellektual’nye texnologii: Тrudy mezhdunarodnoj konferencii “Dialog–2011” [Computational linguistics and intellectual technologies. Proceedings of international conference “Dialog–2011”] (Vol. 10, pp. 359–367). Moscow: Russian State University for the Humanities Publishing House.

    Google Scholar 

  • Goláňová, H., & Waclawičová, M. (2019). The DIALEKT corpus and its possibilities. Journal of Linguistics / Jazykovedný casopis, 70(2), 336–344. https://doi.org/10.2478/jazcas-2019-0063.

    Article  Google Scholar 

  • Grishina, E. (2010). Multimodal Russian corpus (MURCO): first steps. In N. Calzolari, C. Chair, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, M. Rosner, & D. Tapia (Eds.), LREC 2010: proceedings of the seventh conference on international language resources and evaluation (pp. 2953–2960). Valletta: European Language Resources Association. http://www.lrecconf.org/proceedings/lrec2010/pdf/143_Paper.pdf.

    Google Scholar 

  • Gromov, M. L. (2020). Add Lemma phpMorphy [Computer software] https://github.com/maxim-leo-gromov/add_lemma_phpmorphy.

  • Gromov, M. L., & Zemicheva, S. S. (2020). Extending PhpMorhy dictionary with dialect words. Journal of Physics: Conference Series, 1680, Article e012014. https://iopscience.iop.org/article/10.1088/1742-6596/1680/1/012014.

    Google Scholar 

  • Institute for the German Language (2023). Datenbank für Gesprochenes Deutsch [Database of spoken German]. dgd.ids-mannheim.de/dgd/pragdb.dgd_extern.welcome.

  • Ivantsova, E. V. (2020). Formuly rechevogo ėtiketa s blagopozhelatel’noj semantikoj v diskurse nositelj sredneobskix govorov kak otrazhenie narodnoj mental’nosti [Speech etiquette formulas with good wishing semantics in the discourse of speakers of Middle Ob dialects as a reflection of folk mentality]. Tomsk State University Journal, 461, 38–44. https://doi.org/10.17223/15617793/461/5.

    Article  Google Scholar 

  • Ivantsova, E. V. (2021). Zlopozhelanija v rechi nositelj sredneobskix govorov [Ill wishes in the speech of the middle Ob dialects speakers]. Proceedings of the V. V. Vinogradov Russian Languages Institute, 2, 176–185. https://doi.org/10.31912/pvrli-2021.2.14.

    Article  Google Scholar 

  • Ivantsova, E. V. (Ed.) (2006–2012). Polnyj slovar’ dialektnoj jazykovoj lichnosti [Complete dictionary of dialect language personality]. Vols 1–4. Tomsk: Tomsk State University Publishing House.

  • Johannessen, J. B., Priestley, J., Hagen, K., Nøklestad, A., & Lynum, A. (2012). The nordic dialect corpus. In N. Calzolari, K. Choukri, T. Declerck, M. U. Doğan, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of the Eighth International Conference on Language Resources and Evaluation (pp. 3387–3392). Istanbul: European Language Resources Association (ELRA). http://www.lrecconf.org/proceedings/lrec2012/pdf/773_Paper.pdf.

    Google Scholar 

  • Kachinskaya, I. B., & Sichinava, D. V. (2017). O Korpuse dialektnyx tekstov v nacional’nom korpuse Russkogo jazyka [On the corpus of dialectal texts in the Russian national corpus]. Russian Journal of Lexicography, 11, 71–85. https://doi.org/10.17223/22274200/11/5.

    Article  Google Scholar 

  • Kazakova, O. A. (2007). Dialektnaja jazykovaja lichnost’ v zhanrovom aspekte [Dialect language personality in the genre aspect]. Tomsk: Tomsk State Polytechnic University Publishing House.

    Google Scholar 

  • Knuth, D. E. (1997) [1973]. The art of computer programming (3rd ed.). Reading: Addison-Wesley.

    Google Scholar 

  • Knyazev, S. V. (2021). Corpus of the Russian dialect spoken in the basins of Upper Pinega and Vyya rivers. Moscow: Linguistic Convergence Laboratory, HSE University. http://lingconlab.ru/vaduga/.

    Google Scholar 

  • Knyazev, S. V. (2022a). Corpus of the Russian dialect spoken in Keba. Moscow: Linguistic Convergence Laboratory, HSE University. http://lingconlab.ru/keba.

    Google Scholar 

  • Knyazev, S. V. (2022b). Corpus of the Russian dialect spoken in the villages of the Middle Pyoza. Moscow: Linguistic Convergence Laboratory, HSE University. http://lingconlab.ru/pyoza.

    Google Scholar 

  • Knyazev, S. V. (2022c). Corpus of the Russian dialect spoken in Tserkovnoe. Moscow: Linguistic Convergence Laboratory, HSE University. http://lingconlab.ru/tserkovnoe.

    Google Scholar 

  • Knyazev, S. V. (2023). Corpus of the Russian dialect spoken in the villages of the Middle Northern Dvina. Moscow: Linguistic Convergence Laboratory, HSE University. http://lingconlab.ru/dvina.

    Google Scholar 

  • Kortmann, B. (2000–2005). Freiburg English Dialect Corpus. http://www2.anglistik.uni-freiburg.de/institut/lskortmann/FRED/.

  • Kuvshinskaya, Yu. M., & Mashkovtseva, P. Y. (2021). Corpus of the dialect of Manturovsky region, Kostroma oblast. Moscow: Linguistic Convergence Laboratory, HSE University. http://lingconlab.ru/manturovo/.

    Google Scholar 

  • Kuvshinskaya, Yu. M. (2020). Corpus of the Russian dialect spoken in the basins of Lukh and Teza rivers. Moscow: Linguistic Convergence Laboratory, HSE University. http://lingconlab.ru/lukhteza/.

    Google Scholar 

  • Malysheva, A. V., & Ter-Avanesova, A. V. (2021). Luzhnikovo corpus. Moscow: Linguistic Convergence Laboratory, HSE University; V.V. Vinogradov Russian Language Institute Russian Academy of Science. http://lingconlab.ru/luzhnikovo.

    Google Scholar 

  • Martins, A. M. (2000–2023). CORDIAL-SIN: Corpus Dialectal para o Estudo da Sintaxe [Syntax-oriented Corpus of Portuguese Dialects]. Lisboa, Centro de Linguística da Universidade de Lisboa. http://www.clul.ulisboa.pt/en/10-research/314-cordial-sin-corpus.

  • Palagina, V. V. (Ed.) (1989). Russkie govory srednego Priob’ja [Russian dialects of the middle Ob]. Part II. Tomsk: Tomsk State University Publishing House.

  • Palagina, V. V. (Ed.) (1964–1967). Slovar’ Russkix starozhil’cheskix govorov srednej chasti bassejna r. Obi [Dictionary of Russian dialects of long-term residents of the middle part of the Ob river basin]. Vols. 1–3. Tomsk: Tomsk State University Publishing House.

  • Palagina, V. V. (Ed.) (1983–1986). Sredneobskiy slovar’: dopolnenie [The middle Ob dictionary: supplement]. Vols 1–2. Tomsk: Tomsk State University Publishing House.

  • Palagina, V. V. (Ed.) (1985). Russkie govory srednego Priob’ja [Russian dialects of the middle Ob]. Part I. Tomsk: Tomsk State University Publishing House.

  • Panova, A. (2021). Corpus of Russian spoken in Zvenigorod. Moscow: Linguistic Convergence Laboratory, HSE University. http://lingconlab.ru/zvenigorod/.

    Google Scholar 

  • PhpMorphy [Computer software]. http://phpmorphy.sourceforge.net.

  • Popova, D. P. (2022). Funkcii smexa v ustnoj kommunikacii sel’skix zhitelej (po materialam Tomskogo dialektnogo korpusa) [Functions of laughter in oral communication of villagers (based on materials of Tomsk Dialect Corpus)]. Communication Studies (Russia), 9(2), 328–342. https://doi.org/10.24147/2413-6182.2022.9(2).

    Article  Google Scholar 

  • Raaf, M. (2021). Bavaria’s dialects online. LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University. https://bdo.badw.de/.

  • Ron’ko, R., Azanova, A., But’enko, Z., Zambrzhickaya, M., Marchenko, I., Mochulskij, D., & Tsejtina, E. (2022). Corpus of Shetnevo and Makeevo. Moscow: Linguistic Convergence Laboratory, HSE University; V.V. Vinogradov Russian Language Institute Russian Academy of Science. http://lingconlab.ru/shetnevo/#!/.

    Google Scholar 

  • Ron’ko, R., Volf, E., Grebyonkina, M., Ershova, M., Okhapkina, A., Khadasevich, A., & Morozova, V. (2019). Corpus of Opochetsky dialects. Moscow: Linguistic Convergence Laboratory, HSE University; V.V. Vinogradov Russian Language Institute Russian Academy of Science. https://lingconlab.ru/opochka.

    Google Scholar 

  • Ronelle, A., & Zhobov, V. (2022). Bulgarian dialects: living village speech in the digital age. Bloomington: Slavica Publishers.

    Google Scholar 

  • Ruhi, Ş., Işik Güler, H., Hati̇poğlu, Ç., Eröz Tuğa, B., & Çokal Karadaş, D. (2010). Achieving representativeness through the parameters of spoken language and discursive features: the case of the spoken Turkish corpus. In F. Moskowich-Spiegel, B. Crespo García, & I. Lareo Martín (Eds.), Language windowing through corpora. Visualización del lenguaje a través de corpus. Part II (pp. 789–799). Coruna: Universidade da Coruna.

    Google Scholar 

  • Ryko, A. I., & Spiricheva, M. V. (2020). Corpus of the Russian dialect spoken in Khislavichi district. Moscow: Linguistic Convergence Laboratory, HSE University. http://lingconlab.ru/khislavichi/.

    Google Scholar 

  • Sadowsky, S. (2022). The sociolinguistic speech corpus of Chilean Spanish (COSCACH). A socially stratified text, audio and video corpus with multiple speech styles. International Journal of Corpus Linguistics, 27(1), 93–125.

    Article  Google Scholar 

  • Sappok, Ch., Krasovitskij, A., Paschen, L., Brabender, K., Koch, A., & Kühl, N. (2016). RuReg: Russische Regionen. Akustische Datenbank [RuReg: Russian Regions. Acoustic database]. www.rureg.de.

  • Shmeleva, T. V. (1997). Model’ rechevogo zhanra [The model of speech genre]. Zhanry rechi [Speech genres], 1, 91–96.

    Google Scholar 

  • Šumenjak, K. (2013). Priprava gradiva in standardizacija nivojev zapisa za potrebe dialektološkega korpusa GOKO [Preparation of material and standardization of recording levels for the needs of the GOKO dialectological corpus]. In A. Žele (Ed.), Družbena funkcijskost jezika (vidiki, merila, opredelitve) [Social functionality of language (aspects, criteria, definitions)] (Vol. 32, pp. 443–449). Ljubljana: Znanstvena založba Filozofske fakultete.

    Google Scholar 

  • Szmrecsanyi, B. (2014). Methods and objectives in contemporary dialectology. In I. A. Seržant & B. Wiemer (Eds.), Contemporary approaches to dialectology: the area of North, northwest Russian and Belarusian vernaculars (Vol. 12, pp. 81–92). Bergen: University of Bergen.

    Google Scholar 

  • Sеаrlе, J. R. (1969). Speech acts: an essay in the philosophy of language. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Ter-Avanesova, A. V., Balabin, F. A., Dyachenko, S. V., Malysheva, A. V., Panova, A. B., & Morozova, V. A. (2019). Corpus of the Malinino dialect. Moscow: Linguistic Convergence Laboratory, NRU HSE; V.V. Vinogradov Russian Language Institute of the Russian Academy of Science. https://lingconlab.ru/malinino/.

    Google Scholar 

  • Ter-Avanesova, A. V., Dyachenko, S. V., Kolesnikova, E. V., Malysheva, A. V., Ignatenko, D. I., Panova, A. B., & Dobrushina, N. R. (Eds.) (2018). Corpus of Rogovatka dialect. Moscow: Linguistic Convergence Laboratory, NRU HSE. http://lingconlab.ru/rogovatka/.

    Google Scholar 

  • Ter-Avanesova, A. V., Dyachenko, S. V., Korpechkova, E. V., Malysheva, A. V., Pekunova, I. S., & Tolstaya, M. N. (2020). Corpus of the Nekhochi dialect. Moscow: Linguistic Convergence Laboratory HSE University, V.V. Vinogradov Russian Language Institute of the Russian Academy of Science, Institute of Slavic Studies of the Russian Academy of Science. http://lingconlab.ru/nekhochi/.

    Google Scholar 

  • Ugryumova, M. M. (Ed.) (2018). Slovar’ detstva: govory srednego Priob’ja (s lingvokul’turologicheskim kommentariem) [Dictionary of childhood: the middle Ob region dialects (with linguistic and culturological commentary)]. Tomsk: Tomsk State University Publishing House.

  • Vuković, T. (2020). Spoken Torlak dialect corpus 1.0 (transcription). Slavisches Seminar, University of Zurich. https://www.clarin.si/repository/xmlui/handle/11356/1281#.

  • Vuković, T. (2021). Representing variation in a spoken corpus of an endangered dialect: the case of Torlak. Language Resources & Evaluation, 55, 731–756. https://doi.org/10.1007/s10579-020-09522-4.

    Article  Google Scholar 

  • Wiemer, B., Kozhanov, K. A., & Erker, A. (2019). Korpus slav’anskix i baltijskix govorov TriMCo: struktura, celi i primery primenenija [The TriMCo Slavic and Baltic dialect corpus: structure, purposes and examples of applications]. In V. A. Dybo (Ed.), Baltoslav’ankije issl’edovanija (pp. 122–143). Moscow: Buki Vedi LLC. https://doi.org/10.31168/2658-5766.2019.20.6.

    Chapter  Google Scholar 

  • Wierzbicka, A. (2003). Cross-cultural pragmatics: the semantics of human interaction. Berlin: De Gruyter. https://doi.org/10.1515/9783110220964.

    Book  Google Scholar 

  • Zemicheva, S. S. (2018). Vzaimosvjaz’ tematiki dialektnogo teksta i pola govorjashhego (na materiale Tomskogo dialektnogo korpusa) [The correlation between the topic of a dialect text and the speaker’s gender (based on the materials of Tomsk dialect corpus)]. In Aktual’nye problemy i perspektivy rusistiki [Current problems and prospects of Russian studies]. Proceedings of the international conference on Russian studies at the university of Barcelona (pp. 483–492). Barcelona: Trialba Ediciones. http://stel.ub.edu/slavia/wp-content/uploads/%D0%93%D0%BB%D0%B0%D0%B2%D0%B003.pdf.

    Google Scholar 

  • Zemicheva, S. S. (2020a). Osobennosti sredneobskix govorov na sovremennom ėtape razvitija i faktory, vlijajushche na ix soxrannost’ [The Features of the middle Ob dialects at the present stage of development and factors affecting their preservation]. Tomsk State University Journal of Philology, 63, 28–39. https://doi.org/10.17223/19986645/63/2.

    Article  Google Scholar 

  • Zemicheva, S. S. (2020b). Ot abarma do jashchixishka: razrabotka leksikograficheskogo komponenta Tomskogo dialektnogo korpusa [From “Abarmo” to “Yashchixishko”: creating the lexicographic component of the Tomsk Dialect Corpus]. Russian journal of lexicography, 18, 98–117. https://doi.org/10.17223/22274200/18/5.

    Article  Google Scholar 

  • Zemicheva, S. S., Dubtsova, L. A., Gromov, M. L., Galanina, V. V., Ugryumova, M. M., Vasilchenko, A. A., Parshina, A. V., Popova, D. P., Duminskaya, A. V., Zyuzkova, N. A., Bukhanova, & Tomsk, E. D. Dialect Corpus 2.0. Laboratory of General and Siberian Lexicography of the National Research Tomsk State University. http://losl.tsu.ru/?q=losl_search. Retrieved January 10, 2023. Access mode: for registered users.

Download references

Acknowledgements

We are grateful to all our colleagues from the department of Russian Language and the Laboratory of General and Siberian Lexicography of the Tomsk State University, who contributed to designing the structure of the corpus at the initial stage and to implementing the corpus annotation. We are also grateful to research fellows of the Linguistic Convergence Laboratory of HSE University, especially Chiara Naccarato, who read preliminary versions of the paper and provided us with insightful suggestions. Finally, we would like to thank two anonymous reviewers, whose comments contributed to improving the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Svetlana Zemicheva.

Ethics declarations

Competing Interests

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Table 8 Frequencies of different topics in the Tomsk Dialect Corpus
Table 9 Frequencies of different speech acts in the Tomsk Dialect Corpus

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zemicheva, S., Gromov, M., Dubtsova, L. et al. The Tomsk Dialect Corpus: a comprehensively annotated database of a Siberian Russian dialect from material collected over the last 70 years. Russ Linguist 47, 231–252 (2023). https://doi.org/10.1007/s11185-023-09277-w

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11185-023-09277-w

Navigation