Anticipating Next Goal for Robot Plan Prediction

Alati, Edoardo; Mauro, Lorenzo; Ntouskos, Valsamis; Pirri, Fiora

doi:10.1007/978-3-030-29516-5_60

Anticipating Next Goal for Robot Plan Prediction

Edoardo Alati¹⁷,
Lorenzo Mauro¹⁷,
Valsamis Ntouskos¹⁷ &
…
Fiora Pirri¹⁷

Conference paper
First Online: 24 August 2019

1736 Accesses
4 Citations

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1037))

Abstract

Goal reasoning is a main objective for robot task execution. Here we propose a deep model for learning to infer a next goal, while performing an activity. Because predicting the next goal state requires a robot language, not comparable to sentences, we introduce a specific metric for optimization, which is related to the representation the robot has of the scene. Experiments of the proposed idea and method have been done at a warehouse with a humanoid robot performing tasks assisting a maintenance technician working at a production line.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Ajzen, I.: The theory of planned behavior. Organ. Behav. Hum. Decis. Process. 50(2), 179–211 (1991)
Article Google Scholar
Alford, R., Shivashankar, V., Roberts, M., Frank, J., Aha, D.W.: Hierarchical planning: Relating task and goal decomposition with task sharing. In: IJCAI, pp. 3022–3029 (2016)
Google Scholar
Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., Zhang, L.: Bottom-up and top-down attention for image captioning and visual question answering. In: CVPR, vol. 3, p. 6 (2018)
Google Scholar
Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Lawrence, Z., Parikh, D.: VQA: visual question answering. In: CVPR 2015, pp. 2425–2433 (2015)
Google Scholar
Arkin, R.C., Arkin, R.C., et al.: Behavior-Based Robotics. MIT press, Cambridge (1998)
Google Scholar
Bahdanau, D., Brakel, P., Xu, K., Goyal, A., Lowe, R., Pineau, J., Courville, A., Bengio, Y.: An actor-critic algorithm for sequence prediction. In: ICLR (2017)
Google Scholar
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv:1409.0473 (2014)
Bengio, S., Vinyals, O., Jaitly, N., Shazeer, N.: Scheduled sampling for sequence prediction with recurrent neural networks. In: NIPS, pp. 1171–1179 (2015)
Google Scholar
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-dynamic programming: an overview. In: Decision and Control, vol. 1, pp. 560–564 (1995)
Google Scholar
Boutilier, C., Reiter, R., Soutchanski, M., Thrun, S., et al.: Decision-theoretic, high-level agent programming in the situation calculus. In: AAAI/IAAI, pp. 355–362 (2000)
Google Scholar
Chaplot, D.S., Sathyendra, K.M., Pasumarthi, R.K., Rajagopal, D., Salakhutdinov, R.: Gated-attention architectures for task-oriented language grounding. arXiv:1706.07230 (2017)
Chen, L., Zhang, H., Xiao, J., Nie, L., Shao, J., Liu, W., Chua, T.S.: SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6298–6306. IEEE (2017)
Google Scholar
Chen, X., Shrivastava, A., Gupta, A.: Neil: Extracting visual knowledge from web data. In: CVPR 2013, pp. 1409–1416 (2013)
Google Scholar
Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555 (2014)
Das, A., Datta, S., Gkioxari, G., Lee, S., Parikh, D., Batra, D.: Embodied question answering. In: CVPR (2018)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: CVPR 2009, pp. 248–255 (2009)
Google Scholar
Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2625–2634 (2015)
Google Scholar
Downey, C., Hefny, A., Boots, B., Gordon, G.J., Li, B.: Predictive state recurrent neural networks. In: NIPS, pp. 6053–6064 (2017)
Google Scholar
Doyle, R.J., Atkinson, D.J., Doshi, R.S.: Generating perception requests and expectations to verify the execution of plans. In: AAAI, pp. 81–88 (1986)
Google Scholar
Erol, K., Hendler, J.A., Nau, D.S.: UMCP: a sound and complete procedure for hierarchical task-network planning. In: AIPS, vol. 94, pp. 249–254 (1994)
Google Scholar
Fang, H., Gupta, S., Iandola, F., Srivastava, R.K., Deng, L., Dollár, P., Gao, J., He, X., Mitchell, M., Platt, J.C.: From captions to visual concepts and back. In: CVPR 2015, pp. 1473–1482 (2015)
Google Scholar
Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 3389–3396. IEEE (2017)
Google Scholar
Guadarrama, S., Riano, L., Golland, D., Go, D., Jia, Y., Klein, D., Abbeel, P., Darrell, T., et al.: Grounding spatial relations for human-robot interaction. In: IROS, pp. 1640–1647 (2013)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV, pp. 2980–2988. IEEE (2017)
Google Scholar
Helmert, M.: The fast downward planning system. JAIR 26, 191–246 (2006)
Article MATH Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Hofmann, T., Niemueller, T., Lakemeyer, G.: Initial results on generating macro actions from a plan database for planning on autonomous mobile robots. In: ICAPS (2017)
Google Scholar
Hornung, A., Böttcher, S., Schlagenhauf, J., Dornhege, C., Hertle, A., Bennewitz, M.: Mobile manipulation in cluttered environments with humanoids: integrated perception, task planning, and action execution. In: Humanoids, pp. 773–778 (2014)
Google Scholar
Johnson, J., Krishna, R., Stark, M., Li, L.J., Shamma, D., Bernstein, M., Fei-Fei, L.: Image retrieval using scene graphs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3668–3678 (2015)
Google Scholar
Karkus, P., Hsu, D., Lee, W.S.: QMDP-net: Deep learning for planning under partial observability. In: NIPS, pp. 4697–4707 (2017)
Google Scholar
Lee, N., Choi, W., Vernaza, P., Choy, C.B., Torr, P.H., Chandraker, M.: Desire: distant future prediction in dynamic scenes with interacting agents. In: CVPR, pp. 336–345 (2017)
Google Scholar
Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(39), 1–40 (2016)
MathSciNet MATH Google Scholar
Lin, T., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: ECCV 2014, pp. 740–755 (2014)
Chapter Google Scholar
Littman, M.L., Sutton, R.S.: Predictive representations of state. In: NIPS, pp. 1555–1561 (2002)
Google Scholar
Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv:1508.04025 (2015)
Luong, M.T., Sutskever, I., Le, Q.V., Vinyals, O., Zaremba, W.: Addressing the rare word problem in neural machine translation. arXiv:1410.8206 (2014)
Mauro, L., Alati, E., Ntouskos, V., Pirri, F.: Help by predicting what to do. In: IEEE International Conference on Image Processing (ICIP 2019) (2019)
Google Scholar
Mauro, L., Alati, E., Ntouskos, V., Pirri, F., Izadpanahkakhk, M., Omrani, E.: Anticipation and next action forecasting in video: an end-to-end model with memory. arXiv preprint arXiv:1901.03728 (2019)
Mauro, L., Alati, E., Sanzari, M., Ntouskos, V., Massimiani, G., Pirri, F.: Deep execution monitor for robot assistive tasks. In: ECCV, ACVR Workshop, pp. 158–175 (2018)
Google Scholar
McFadden, D., et al.: Conditional logit analysis of qualitative choice behavior (1973)
Google Scholar
Mendoza, J.P., Veloso, M., Simmons, R.: Plan execution monitoring through detection of unmet expectations about action outcomes. In: ICRA, pp. 3247–3252 (2015)
Google Scholar
Mesnil, G., Bordes, A., Weston, J., Chechik, G., Bengio, Y.: Learning semantic representations of objects and their parts. Mach. Learn. 94(2), 281–301 (2014)
Article MathSciNet MATH Google Scholar
Mesnil, G., Rifai, S., Bordes, A., Glorot, X., Bengio, Y., Vincent, P.: Unsupervised learning of semantics of object detections for scene categorization. In: Pattern Recognition Applications and Methods, pp. 209–224 (2015)
Google Scholar
Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning. ICML 2016, 1928–1937 (2016)
Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Norouzi, M., Bengio, S., Jaitly, N., Schuster, M., Wu, Y., Schuurmans, D., et al.: Reward augmented maximum likelihood for neural structured prediction. In: NIPS, pp. 1723–1731 (2016)
Google Scholar
Ntouskos, V., Sanzari, M., Alati, E., Freda, L., Pirri, F.: Visual search and recognition for robot task execution and monitoring. In: Applications of Intelligent Systems: Proceedings of the 1st International APPIS Conference 2018, vol. 310, p. 94. IOS Press (2018)
Google Scholar
Pan, J.Y., Yang, H.J., Faloutsos, C., Duygulu, P.: Gcap: Graph-based automatic image captioning. In: CVPRW 2004. Conference on Computer Vision and Pattern Recognition Workshop, p. 146. IEEE (2004)
Google Scholar
Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 16–17 (2017)
Google Scholar
Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: ICML, vol. 2017 (2017)
Google Scholar
Pathak, D., Mahmoudieh, P., Luo, G., Agrawal, P., Chen, D., Shentu, Y., Shelhamer, E., Malik, J., Efros, A.A., Darrell, T.: Zero-shot visual imitation. In: ICLR (2018)
Google Scholar
Pei, M., Jia, Y., Zhu, S.C.: Parsing video events with goal inference and intent prediction. In: 2011 International Conference on Computer Vision, pp. 487–494. IEEE (2011)
Google Scholar
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Pereyra, G., Tucker, G., Chorowski, J., Kaiser, Ł., Hinton, G.: Regularizing neural networks by penalizing confident output distributions. arXiv:1701.06548 (2017)
Petrick, R.P., Bacchus, F.: PKS: knowledge-based planning with incomplete information and sensing. In: ICAPS (2004)
Google Scholar
Ranzato, M., Chopra, S., Auli, M., Zaremba, W.: Sequence level training with recurrent neural networks. In: ICLR (2017)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)
Google Scholar
Sermanet, P., Lynch, C., Chebotar, Y., Hsu, J., Jang, E., Schaal, S., Levine, S.: Time-contrastive networks: self-supervised learning from video. arXiv:1704.06888 (2018)
Shivashankar, V.: Hierarchical goal networks: formalisms and algorithms for planning and acting. Ph.D. thesis, University of Maryland, College Park (2015)
Google Scholar
Singh, S., Jaakkola, T., Littman, M.L., Szepesvári, C.: Convergence results for single-step on-policy reinforcement-learning algorithms. Mach. Learn. 38(3), 287–308 (2000)
Article MATH Google Scholar
Sohn, S., Oh, J., Lee, H.: Multitask reinforcement learning for zero-shot generalization with subtask dependencies. arXiv:1807.07665 (2018)
Sun, W., Venkatraman, A., Boots, B., Bagnell, J.A.: Learning to filter with predictive state inference machines. In: ICML, pp. 1197–1205 (2016)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: NIPS, pp. 3104–3112 (2014)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn., vol. 1 (1998, 2017)
Article MATH Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press, Cambridge (2018)
MATH Google Scholar
Tensorflow: Tensorflow models (2018). https://github.com/tensorflow/models/
Venugopalan, S., Rohrbach, M., Donahue, J., Mooney, R., Darrell, T., Saenko, K.: Sequence to sequence-video to text. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4534–4542 (2015)
Google Scholar
Wilkins, D.E.: Recovering from execution errors in sipe. Comput. Intell. 1(1), 33–45 (1985)
Article Google Scholar
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057 (2015)
Google Scholar
Yamada, T., Murata, S., Arie, H., Ogata, T.: Representation learning of logic words by an RNN: from word sequences to robot actions. Front. Neurorobotics 11, 70 (2017)
Article Google Scholar
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489 (2016)
Google Scholar
You, Q., Jin, H., Wang, Z., Fang, C., Luo, J.: Image captioning with semantic attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4651–4659 (2016)
Google Scholar
Zhang, X., Xie, G., Liu, C., Bengio, Y.: End-to-end online writer identification with recurrent neural network. IEEE Trans. Hum. Mach. Syst. 47(2), 285–292 (2017)
Article Google Scholar
Zhu, L., Xu, Z., Yang, Y., Hauptmann, A.G.: Uncovering the temporal context for video question answering. IJCV 124(3), 409–421 (2017)
Article MathSciNet Google Scholar
Zhu, Y., Gordon, D., Kolve, E., Fox, D., Fei-Fei, L., Gupta, A., Mottaghi, R., Farhadi, A.: Visual semantic planning using deep successor representations. CoRR (2017)
Google Scholar

Download references

Acknowledgments

The research has been granted by the H2020 Project Second Hands under grant agreement No. 643950.

Author information

Authors and Affiliations

Alcor LAB, Dipartimento di Ingegneria Informatica Automatica e Gestionale, University of Rome “Sapienza”, Rome, Italy
Edoardo Alati, Lorenzo Mauro, Valsamis Ntouskos & Fiora Pirri

Authors

Edoardo Alati
View author publications
You can also search for this author in PubMed Google Scholar
Lorenzo Mauro
View author publications
You can also search for this author in PubMed Google Scholar
Valsamis Ntouskos
View author publications
You can also search for this author in PubMed Google Scholar
Fiora Pirri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Edoardo Alati , Lorenzo Mauro , Valsamis Ntouskos or Fiora Pirri .

Editor information

Editors and Affiliations

School of Computing, Computer Science Research Institute, Ulster University, Newtownabbey, UK
Yaxin Bi
The Science and Information (SAI) Organization, Bradford, West Yorkshire, UK
Rahul Bhatia
The Science and Information (SAI) Organization, Bradford, West Yorkshire, UK
Supriya Kapoor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alati, E., Mauro, L., Ntouskos, V., Pirri, F. (2020). Anticipating Next Goal for Robot Plan Prediction. In: Bi, Y., Bhatia, R., Kapoor, S. (eds) Intelligent Systems and Applications. IntelliSys 2019. Advances in Intelligent Systems and Computing, vol 1037. Springer, Cham. https://doi.org/10.1007/978-3-030-29516-5_60

Download citation

DOI: https://doi.org/10.1007/978-3-030-29516-5_60
Published: 24 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29515-8
Online ISBN: 978-3-030-29516-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics