Abstract
Industrial robots today are still mostly pre-programmed to perform a specific task. Despite previous research in human-robot interaction in the academia, adopting such systems in industrial settings is not trivial and has rarely been done. In this paper, we introduce a robotic system that we control with high-level verbal commands, leveraging some of the latest neural approaches to language understanding and a cognitive architecture for goal-directed but reactive execution. We show that a large-scale pre-trained language model can be effectively fine-tuned for translating verbal instructions into robot tasks, better than other semantic parsing methods, and that our system is capable of handling through dialogue a variety of exceptions that happen during human-robot interaction including unknown tasks, user interruption, and changes in the world state.
This research is supported by A*STAR under its Human-Robot Collaborative AI for Advanced Manufacturing and Engineering (Award A18A2b0046).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Artzi, Y., Das, D., Petrov, S.: Learning compact lexicons for CCG semantic parsing. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1273–1283 (2014)
Cambria, E., Poria, S., Hazarika, D., Kwok, K.: SenticNet 5: discovering conceptual primitives for sentiment analysis by means of context embeddings. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1795–1802 (2018)
Chen, H., Tan, H., Kuntz, A., Bansal, M., Alterovitz, R.: Enabling robots to understand incomplete natural language instructions using commonsense reasoning. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 1963–1969 (2020)
Choi, D., Langley, P.: Evolution of the ICARUS cognitive architecture. Cogn. Syst. Res. 48, 25–38 (2018)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171–418 (2019)
Dong, L., Lapata, M.: Coarse-to-fine decoding for neural semantic parsing. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pp. 731–742 (2018)
Elgohary, A., Hosseini, S., Awadallah, A.H.: Speak to your parser: interactive text-to-SQL with natural language feedback. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 2065–2077 (2020)
Fikes, R., Nilsson, N.: STRIPS: a new approach to the application of theorem proving to problem solving. Artif. Intell. 2, 189–208 (1971)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Horn, A.: On sentences which are true of direct unions of algebras. J. Symbolic Log. 16, 14–21 (1951)
Jia, Y., She, L., Cheng, Y., Bao, J., Chai, J.Y., Xi, N.: Program robots manufacturing tasks by natural language instructions. In: Proceedings of the IEEE International Conference on Automation Science and Engineering, pp. 633–638 (2016)
Kuo, Y.L., Katz, B., Barbu, A.: Deep compositional robotic planners that follow natural language commands. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 4906–4912 (2020)
Laird, J.E., et al.: Interactive task learning. IEEE Intell. Syst. 32(4), 6–21 (2017)
Park, J.S., Jia, B., Bansal, M., Manocha, D.: Efficient generation of motion plans from attribute-based natural language instructions using dynamic constraint mapping. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 6964–6971 (2019)
Pennington, J., Socher, R., Manning, C.D.: GloVe: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1532–1543 (2014)
Radford, A., et al.: Language models are unsupervised multitask learners. OpenAI blog 1, 9 (2019)
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(140), 1–67 (2020)
Venkatesh, S.G., et al.: Spatial reasoning from natural language instructions for robot manipulation. In: Proceedings of the IEEE International Conference on Robotics and Automation (2021)
Wächter, M., et al.: Integrating multi-purpose natural language understanding, robot’s memory, and symbolic planning for task execution in humanoid robots. Robot. Auton. Syst. 99, 148–165 (2018)
Yin, P., Neubig, G., Yih, W.T., Riedel, S.: TaBERT: pretraining for joint understanding of textual and tabular data. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 8413–8426 (2020)
Zeng, J., et al.: Photon: a robust cross-domain text-to-SQL system. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 204–214 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Choi, D., Shi, W., Liang, Y.S., Yeo, K.H., Kim, JJ. (2021). Controlling Industrial Robots with High-Level Verbal Commands. In: Li, H., et al. Social Robotics. ICSR 2021. Lecture Notes in Computer Science(), vol 13086. Springer, Cham. https://doi.org/10.1007/978-3-030-90525-5_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-90525-5_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-90524-8
Online ISBN: 978-3-030-90525-5
eBook Packages: Computer ScienceComputer Science (R0)