Skip to main content

Semantically Enriching Embeddings of Highly Inflectable Verbs for Improving Intent Detection in a Romanian Home Assistant Scenario

  • Conference paper
  • First Online:
Advances in Intelligent Data Analysis XIX (IDA 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12695))

Included in the following conference series:

  • 795 Accesses

Abstract

Word embeddings are known to encapsulate semantic similarity and have become the preferred representation solution for NLP models. However, they fail to identify the type of semantic relationship, which – in some applications – might be crucial. This paper adapts an existing solution for enhancing word embedding representations such as to better separate between synonyms and antonyms in an intent detection task applied to a Romanian home assistant scenario. Accounting for the morphological richness of the Romanian language, our method proposes an additional augmentation step, in order to generate conjugated pairs of antonym and synonym verbs. The generated pairs are run through the counterfitting step (inspired from literature), for which we propose a justified improvement for one of the hyperparameters. The evaluations performed on the home assistant scenario have shown that the pre-processing step has an essential role in reducing opposing intent errors in the classification model (by almost two thirds).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/eaaskt/nlu/tree/master/rad-antonyms.

  2. 2.

    https://rodrigopivi.github.io/Chatito/ - Generate datasets for NLU models using a simple DSL.

  3. 3.

    https://pypi.org/project/mlconjug/ - A Python library to conjugate verbs using Machine Learning techniques.

  4. 4.

    https://spacy.io - Industrial-Strength Natural Language Processing.

References

  1. Stoica, A., Kadar, T., Lemnaru, C., Potolea, R., Dînşoreanu, M.: The impact of data challenges on intent detection and slot filling for the home assistant scenario. In: IEEE 15th International Conference on Intelligent Computer Communication and Processing (ICCP), pp. 41–47. IEEE (2019)

    Google Scholar 

  2. Stoica, A.D., et al.: The impact of Romanian diacritics on intent detection and slot filling. In: IEEE International Conference on Automation, Quality and Testing, Robotics (AQTR), pp. 1–6 (2020)

    Google Scholar 

  3. Mrkšić, N., et al.: Counter-fitting word vectors to linguistic constraints. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2016)

    Google Scholar 

  4. Bocklisch, T., Faulkner, J., Pawlowski, N., Nichol, A.: Rasa: open source language understanding and dialogue management. arXiv preprint arXiv:1712.05181 (2017)

  5. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  6. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, p. 26 (2013)

    Google Scholar 

  7. Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016)

  8. Joulin, A., Grave, E., Bojanowski, P., Douze, M., Jégou, H., Mikolov, T.: Fasttext. zip: compressing text classification models. arXiv preprint arXiv:1612.03651 (2016)

  9. Ali, M.A., Sun, Y., Zhou, X., Wang, W. and Zhao, X.: Antonym-synonym classification based on new sub-space embeddings. arXiv preprint arXiv:1612.03651 (2019)

  10. Nguyen, K.A., Walde, S.S.I., Vu, N.T.: Distinguishing antonyms and synonyms in a pattern-based neural network. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers (2017)

    Google Scholar 

  11. Kim, J., Tur, G., Celikyilmaz, A., Cao, B., Wang, Y.: Intent detection using semantically enriched word embeddings. In: IEEE Spoken Language Technology Workshop (SLT) 2016, pp. 414–419 (2016)

    Google Scholar 

  12. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014)

    Google Scholar 

  13. Mrkšić, N., et al.: Semantic specialization of distributional word vector spaces using monolingual and cross-lingual constraints. Trans. Assoc. Comput. Linguis. 5, 309–324 (2017). https://www.aclweb.org/anthology/Q17-1022

  14. Dumitrescu, S.D., Avram, A.M., Morogan, L., Toma, S.A.: Rowordnet-a python api for the romanian wordnet. In: Proceedings of the 10th International Conference on Electronics, Computers and Artificial Intelligence (ECAI), pp. 1–6. IEEE (2018)

    Google Scholar 

Download references

Acknowledgment

The work presented in this paper was supported by grant no. 72PCCDI/01.03.2018, ROBIN - Robots and Society: Cognitive Systems for Personal Robots and Autonomous Vehicles.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Appendix A Confusion matrices and histograms

Appendix A Confusion matrices and histograms

The evolution of the confusion matrices (for evaluation Scenario 1) display the reduction in number of errors, as there are more elements on the main diagonal. The evolution of the confidence histograms (for evaluation Scenario 2) depict less wrong predictions with high and low confidences – most low margin misclassifications were solved, and confidence for those with high confidence was mostly reduced.

figure b

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rad, AC., Muntean, IHM., Stoica, AD., Lemnaru, C., Potolea, R., Dînșoreanu, M. (2021). Semantically Enriching Embeddings of Highly Inflectable Verbs for Improving Intent Detection in a Romanian Home Assistant Scenario. In: Abreu, P.H., Rodrigues, P.P., Fernández, A., Gama, J. (eds) Advances in Intelligent Data Analysis XIX. IDA 2021. Lecture Notes in Computer Science(), vol 12695. Springer, Cham. https://doi.org/10.1007/978-3-030-74251-5_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-74251-5_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-74250-8

  • Online ISBN: 978-3-030-74251-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics