Skip to main content

Evaluating Distributional Semantic Models with Russian Noun-Adjective Compositions

  • Conference paper
  • First Online:
  • 1223 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 661))

Abstract

In the paper vector-space semantic models based on Word2Vec word embeddings algorithm and a count-based association-oriented algorithm are evaluated and compared by measuring association strength between Russian nouns and adjectives. A dataset of nouns and associated adjectives is used as the test set for pseudodisambiguation task. Models are trained with corpora of Russian fiction. A measure of lexical association anomaly is applied evaluating similarity between the initial noun and the resulting attributive phrase. Results of association strength are reported for models characterized by different parameter values; the best parameter value combinations are proposed. The test exemplars producing the error rate are manually annotated, and the model errors are categorized in terms of their linguistic nature and compositionality features.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://alt.qcri.org/semeval2014/task1/.

  2. 2.

    http://clic.cimec.unitn.it/composes/.

  3. 3.

    http://maggie.lt.informatik.tu-darmstadt.de/jobimtext/.

  4. 4.

    http://serelex.it-claim.ru/.

  5. 5.

    http://ling.go.mail.ru/dsm/ru/.

  6. 6.

    http://russe.nlpub.ru/.

  7. 7.

    http://www.lib.ru/.

  8. 8.

    http://ling.go.mail.ru/misc/dialogue_2015.html#rnc.

References

  1. Baroni, M., Bernardi, R., Zamparelli, R.: Frege in space: a program of compositional distributional semantics. Linguist. Issues Lang. Technol. 9 (2014)

    Google Scholar 

  2. Biemann, C.: Unsupervised and knowledge-free natural language processing in the structure discovery paradigm. Ph.D. thesis, Universität Leipzig (2007)

    Google Scholar 

  3. Bukia, G., Protopopova, E., Mitrofanova, O.: A corpus-driven estimation of association strength in lexical constructions. In: Sergey Balandin, T.T., Trifonova, U. (eds.) Proceedings of the AINL-ISMW FRUCT, pp. 147–152. FRUCT Oy, Finland (2015). http://fruct.org/publications/ainl-abstract/files/Buk.pdf

    Google Scholar 

  4. Goldberg, A.: Constructions: a construction grammar approach to argument structure (1994)

    Google Scholar 

  5. Kartsaklis, D., Sadrzadeh, M., et al.: Prior disambiguation of word tensors for constructing sentence vectors. In: Proceedings of EMNLP, pp. 1590–1601 (2013)

    Google Scholar 

  6. Kochmar, E., Briscoe, T.: Capturing anomalies in the choice of content words in compositional distributional semantic space. In: Proceedings of RANLP, pp. 365–372 (2013)

    Google Scholar 

  7. Kolb, P.: Disco: a multilingual database of distributionally similar words. In: Proceedings of KONVENS-2008, Berlin (2008)

    Google Scholar 

  8. Kutuzov, A., Andreev, I.: Texts in, meaning out: neural language models in semantic similarity task for russian. arXiv preprint arXiv:1504.08183 (2015)

  9. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  10. Panchenko, A., Loukachevitch, N., Ustalov, D., Paperno, D., Meyer, C., Konstantinova, N.: Russe: the first workshop on Russian semantic similarity. In: Proceeding of the Dialogue 2015 Conference (2015)

    Google Scholar 

  11. Panchenko, A., Romanov, P., Morozova, O., Naets, H., Philippovich, A., Romanov, A., Fairon, C.: Serelex: search and visualization of semantically related words. In: Serdyukov, P., Braslavski, P., Kuznetsov, S.O., Kamps, J., Rüger, S., Agichtein, E., Segalovich, I., Yilmaz, E. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 837–840. Springer, Heidelberg (2013). doi:10.1007/978-3-642-36973-5_97

    Chapter  Google Scholar 

  12. Pekar, V., Staab, S.: Word classification based on combined measures of distributional and semantic similarity. In: Proceedings of the Tenth Conference on European Chapter of the Association for Computational Linguistics, vol. 2, pp. 147–150. Association for Computational Linguistics (2003)

    Google Scholar 

  13. Rehurek, R., Sojka, P.: Software framework for topic modelling with large corpora (2010)

    Google Scholar 

  14. Sahlgren, M.: The word-space model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces (2006)

    Google Scholar 

  15. Schütze, H.: Dimensions of meaning. In: Proceedings of Supercomputing 1992, pp. 787–796. IEEE (1992)

    Google Scholar 

  16. Vecchi, E.M., Baroni, M., Zamparelli, R.: (Linear) maps of the impossible: capturing semantic anomalies in distributional space. In: Proceedings of the Workshop on Distributional Semantics and Compositionality, pp. 1–9. Association for Computational Linguistics (2011)

    Google Scholar 

  17. Widdows, D., Cohen, T.: The semantic vectors package: new algorithms and public tools for distributional semantics. In: 2010 IEEE Fourth International Conference on Semantic Computing (ICSC), pp. 9–15. IEEE (2010)

    Google Scholar 

Download references

Acknowledgments

The reported study is supported by RFBR grant № 16-06-00529 “Development of a linguistic toolkit for semantic analysis of Russian text corpora by statistical techniques”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Polina Panicheva .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Panicheva, P., Protopopova, E., Bukia, G., Mitrofanova, O. (2017). Evaluating Distributional Semantic Models with Russian Noun-Adjective Compositions. In: Ignatov, D., et al. Analysis of Images, Social Networks and Texts. AIST 2016. Communications in Computer and Information Science, vol 661. Springer, Cham. https://doi.org/10.1007/978-3-319-52920-2_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-52920-2_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-52919-6

  • Online ISBN: 978-3-319-52920-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics