Abstract
Word embeddings are known to encapsulate semantic similarity and have become the preferred representation solution for NLP models. However, they fail to identify the type of semantic relationship, which – in some applications – might be crucial. This paper adapts an existing solution for enhancing word embedding representations such as to better separate between synonyms and antonyms in an intent detection task applied to a Romanian home assistant scenario. Accounting for the morphological richness of the Romanian language, our method proposes an additional augmentation step, in order to generate conjugated pairs of antonym and synonym verbs. The generated pairs are run through the counterfitting step (inspired from literature), for which we propose a justified improvement for one of the hyperparameters. The evaluations performed on the home assistant scenario have shown that the pre-processing step has an essential role in reducing opposing intent errors in the classification model (by almost two thirds).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
https://rodrigopivi.github.io/Chatito/ - Generate datasets for NLU models using a simple DSL.
- 3.
https://pypi.org/project/mlconjug/ - A Python library to conjugate verbs using Machine Learning techniques.
- 4.
https://spacy.io - Industrial-Strength Natural Language Processing.
References
Stoica, A., Kadar, T., Lemnaru, C., Potolea, R., Dînşoreanu, M.: The impact of data challenges on intent detection and slot filling for the home assistant scenario. In: IEEE 15th International Conference on Intelligent Computer Communication and Processing (ICCP), pp. 41–47. IEEE (2019)
Stoica, A.D., et al.: The impact of Romanian diacritics on intent detection and slot filling. In: IEEE International Conference on Automation, Quality and Testing, Robotics (AQTR), pp. 1–6 (2020)
Mrkšić, N., et al.: Counter-fitting word vectors to linguistic constraints. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2016)
Bocklisch, T., Faulkner, J., Pawlowski, N., Nichol, A.: Rasa: open source language understanding and dialogue management. arXiv preprint arXiv:1712.05181 (2017)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, p. 26 (2013)
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016)
Joulin, A., Grave, E., Bojanowski, P., Douze, M., Jégou, H., Mikolov, T.: Fasttext. zip: compressing text classification models. arXiv preprint arXiv:1612.03651 (2016)
Ali, M.A., Sun, Y., Zhou, X., Wang, W. and Zhao, X.: Antonym-synonym classification based on new sub-space embeddings. arXiv preprint arXiv:1612.03651 (2019)
Nguyen, K.A., Walde, S.S.I., Vu, N.T.: Distinguishing antonyms and synonyms in a pattern-based neural network. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers (2017)
Kim, J., Tur, G., Celikyilmaz, A., Cao, B., Wang, Y.: Intent detection using semantically enriched word embeddings. In: IEEE Spoken Language Technology Workshop (SLT) 2016, pp. 414–419 (2016)
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014)
Mrkšić, N., et al.: Semantic specialization of distributional word vector spaces using monolingual and cross-lingual constraints. Trans. Assoc. Comput. Linguis. 5, 309–324 (2017). https://www.aclweb.org/anthology/Q17-1022
Dumitrescu, S.D., Avram, A.M., Morogan, L., Toma, S.A.: Rowordnet-a python api for the romanian wordnet. In: Proceedings of the 10th International Conference on Electronics, Computers and Artificial Intelligence (ECAI), pp. 1–6. IEEE (2018)
Acknowledgment
The work presented in this paper was supported by grant no. 72PCCDI/01.03.2018, ROBIN - Robots and Society: Cognitive Systems for Personal Robots and Autonomous Vehicles.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Appendix A Confusion matrices and histograms
Appendix A Confusion matrices and histograms
The evolution of the confusion matrices (for evaluation Scenario 1) display the reduction in number of errors, as there are more elements on the main diagonal. The evolution of the confidence histograms (for evaluation Scenario 2) depict less wrong predictions with high and low confidences – most low margin misclassifications were solved, and confidence for those with high confidence was mostly reduced.
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Rad, AC., Muntean, IHM., Stoica, AD., Lemnaru, C., Potolea, R., Dînșoreanu, M. (2021). Semantically Enriching Embeddings of Highly Inflectable Verbs for Improving Intent Detection in a Romanian Home Assistant Scenario. In: Abreu, P.H., Rodrigues, P.P., Fernández, A., Gama, J. (eds) Advances in Intelligent Data Analysis XIX. IDA 2021. Lecture Notes in Computer Science(), vol 12695. Springer, Cham. https://doi.org/10.1007/978-3-030-74251-5_20
Download citation
DOI: https://doi.org/10.1007/978-3-030-74251-5_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-74250-8
Online ISBN: 978-3-030-74251-5
eBook Packages: Computer ScienceComputer Science (R0)