skip to main content
10.1145/3610977.3637473acmconferencesArticle/Chapter ViewAbstractPublication PageshriConference Proceedingsconference-collections
short-paper

Dataset and Evaluation of Automatic Speech Recognition for Multi-lingual Intent Recognition on Social Robots

Published:11 March 2024Publication History

ABSTRACT

While Automatic Speech Recognition (ASR) systems excel in controlled environments, challenges arise in robot-specific setups due to unique microphone requirements and added noise sources. In this paper, we create a dataset of initiating conversations with brief exchanges in 5 European languages, and we systematically evaluate current state-of-art ASR systems (Vosk, OpenWhisper, Google Speech and NVidia Riva). Besides standard metrics, we also look at two critical downstream tasks for human-robot verbal interaction: intent recognition rate and entity extraction, using the open-source Rasa chatbot. Overall, we found that open-source solutions as Vosk performs competitively with closed-source solutions while running on the edge, on a low compute budget (CPU only).

References

  1. Online. https://alphacephei.com/vosk/.Google ScholarGoogle Scholar
  2. Online. https://cloud.google.com/speech-to-text/?hl=en.Google ScholarGoogle Scholar
  3. Online. https://github.com/openai/whisper.Google ScholarGoogle Scholar
  4. Online. https://www.nvidia.com/en-us/ai-data-science/products/riva/.Google ScholarGoogle Scholar
  5. Online. https://rasa.com/.Google ScholarGoogle Scholar
  6. Online. https://rasa.com/docs/rasa/glossary/#intent.Google ScholarGoogle Scholar
  7. Online. https://rasa.com/docs/rasa/glossary/#entity.Google ScholarGoogle Scholar
  8. Online. https://wiki.seeedstudio.com/ReSpeaker_Mic_Array_v2.0/.Google ScholarGoogle Scholar
  9. Online. https://pypi.org/project/jiwer.Google ScholarGoogle Scholar
  10. Sören Becker, Marcel Ackermann, Sebastian Lapuschkin, Klaus-Robert Müller, and Wojciech Samek. 2018. Interpreting and Explaining Deep Neural Networks for Classification of Audio Signals. CoRR abs/1807.03418 (2018). arXiv:1807.03418Google ScholarGoogle Scholar
  11. Sara Cooper, Alessandro Di Fava, Carlos Vivas, Luca Marchionni, and Francesco Ferro. 2020. ARI: the Social Assistive Robot and Companion. In 2020 29th IEEE International Conference on Robot and Human Interactive Communication (ROMAN). 745--751. https://doi.org/10.1109/RO-MAN47096.2020.9223470Google ScholarGoogle ScholarCross RefCross Ref
  12. Lingyun Feng, Jianwei Yu, Deng Cai, Songxiang Liu, Haitao Zheng, and Yan Wang. 2022. ASR-GLUE: A New Multi-task Benchmark for ASR-Robust Natural Language Understanding. arXiv:2108.13048 [cs.CL]Google ScholarGoogle Scholar
  13. Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang, Jiahui Yu, Wei Han, Shibo Wang, Zhengdong Zhang, Yonghui Wu, and Ruoming Pang. 2020. Conformer: Convolution-augmented Transformer for Speech Recognition. arXiv:2005.08100 [eess.AS]Google ScholarGoogle Scholar
  14. Zohar Jackson, César Souza, Jason Flaks, Yuxin Pan, Hereman Nicolas, and Adhish Thite. 2018. Jakobovski/free-spoken-digit-dataset: v1.0.8. https://doi.org/10.5281/ zenodo.1342401Google ScholarGoogle Scholar
  15. S. Lemaignan, S. Cooper, R. Ros, L. Ferrini, A. Andriella, and A. Irisarri. 2023. Open-source Natural Language Processing on the PAL Robotics ARI Social Robot. In Proceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction. https://doi.org/10.1145/3568294.3580041Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Mishaim Malik, Muhammad Malik, Khawar Mehmood, and Imran Makhdoom. 2021. Automatic speech recognition: a survey. Multimedia Tools and Applications 80 (03 2021), 1--47. https://doi.org/10.1007/s11042-020--10073--7Google ScholarGoogle ScholarCross RefCross Ref
  17. Mirko Marras., Pedro A. Marín-Reyes., Javier Lorenzo-Navarro., Modesto Castrillón-Santana., and Gianni Fenu. 2019. AveRobot: An Audio-visual Dataset for People Re-identification and Verification in Human-Robot Interaction. In Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods - ICPRAM. INSTICC, SciTePress, 255--265. https://doi.org/10.5220/ 0007690902550265Google ScholarGoogle ScholarCross RefCross Ref
  18. José Novoa-Ilic, Rodrigo Mahu, Jorge Wuth, Juan Escudero, Josué Fredes, and Nestor Yoma. 2021. Automatic Speech Recognition for Indoor HRI Scenarios. ACM Transactions on Human-Robot Interaction 10 (03 2021), 1--30. https://doi. org/10.1145/3442629Google ScholarGoogle Scholar

Index Terms

  1. Dataset and Evaluation of Automatic Speech Recognition for Multi-lingual Intent Recognition on Social Robots

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        HRI '24: Proceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction
        March 2024
        982 pages
        ISBN:9798400703225
        DOI:10.1145/3610977

        Copyright © 2024 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 11 March 2024

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper

        Acceptance Rates

        Overall Acceptance Rate242of1,000submissions,24%
      • Article Metrics

        • Downloads (Last 12 months)27
        • Downloads (Last 6 weeks)27

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader