Skip to main content

Anticipating Next Goal for Robot Plan Prediction

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1037))

Abstract

Goal reasoning is a main objective for robot task execution. Here we propose a deep model for learning to infer a next goal, while performing an activity. Because predicting the next goal state requires a robot language, not comparable to sentences, we introduce a specific metric for optimization, which is related to the representation the robot has of the scene. Experiments of the proposed idea and method have been done at a warehouse with a humanoid robot performing tasks assisting a maintenance technician working at a production line.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Ajzen, I.: The theory of planned behavior. Organ. Behav. Hum. Decis. Process. 50(2), 179–211 (1991)

    Article  Google Scholar 

  2. Alford, R., Shivashankar, V., Roberts, M., Frank, J., Aha, D.W.: Hierarchical planning: Relating task and goal decomposition with task sharing. In: IJCAI, pp. 3022–3029 (2016)

    Google Scholar 

  3. Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., Zhang, L.: Bottom-up and top-down attention for image captioning and visual question answering. In: CVPR, vol. 3, p. 6 (2018)

    Google Scholar 

  4. Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Lawrence, Z., Parikh, D.: VQA: visual question answering. In: CVPR 2015, pp. 2425–2433 (2015)

    Google Scholar 

  5. Arkin, R.C., Arkin, R.C., et al.: Behavior-Based Robotics. MIT press, Cambridge (1998)

    Google Scholar 

  6. Bahdanau, D., Brakel, P., Xu, K., Goyal, A., Lowe, R., Pineau, J., Courville, A., Bengio, Y.: An actor-critic algorithm for sequence prediction. In: ICLR (2017)

    Google Scholar 

  7. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv:1409.0473 (2014)

  8. Bengio, S., Vinyals, O., Jaitly, N., Shazeer, N.: Scheduled sampling for sequence prediction with recurrent neural networks. In: NIPS, pp. 1171–1179 (2015)

    Google Scholar 

  9. Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-dynamic programming: an overview. In: Decision and Control, vol. 1, pp. 560–564 (1995)

    Google Scholar 

  10. Boutilier, C., Reiter, R., Soutchanski, M., Thrun, S., et al.: Decision-theoretic, high-level agent programming in the situation calculus. In: AAAI/IAAI, pp. 355–362 (2000)

    Google Scholar 

  11. Chaplot, D.S., Sathyendra, K.M., Pasumarthi, R.K., Rajagopal, D., Salakhutdinov, R.: Gated-attention architectures for task-oriented language grounding. arXiv:1706.07230 (2017)

  12. Chen, L., Zhang, H., Xiao, J., Nie, L., Shao, J., Liu, W., Chua, T.S.: SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6298–6306. IEEE (2017)

    Google Scholar 

  13. Chen, X., Shrivastava, A., Gupta, A.: Neil: Extracting visual knowledge from web data. In: CVPR 2013, pp. 1409–1416 (2013)

    Google Scholar 

  14. Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555 (2014)

  15. Das, A., Datta, S., Gkioxari, G., Lee, S., Parikh, D., Batra, D.: Embodied question answering. In: CVPR (2018)

    Google Scholar 

  16. Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: CVPR 2009, pp. 248–255 (2009)

    Google Scholar 

  17. Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2625–2634 (2015)

    Google Scholar 

  18. Downey, C., Hefny, A., Boots, B., Gordon, G.J., Li, B.: Predictive state recurrent neural networks. In: NIPS, pp. 6053–6064 (2017)

    Google Scholar 

  19. Doyle, R.J., Atkinson, D.J., Doshi, R.S.: Generating perception requests and expectations to verify the execution of plans. In: AAAI, pp. 81–88 (1986)

    Google Scholar 

  20. Erol, K., Hendler, J.A., Nau, D.S.: UMCP: a sound and complete procedure for hierarchical task-network planning. In: AIPS, vol. 94, pp. 249–254 (1994)

    Google Scholar 

  21. Fang, H., Gupta, S., Iandola, F., Srivastava, R.K., Deng, L., Dollár, P., Gao, J., He, X., Mitchell, M., Platt, J.C.: From captions to visual concepts and back. In: CVPR 2015, pp. 1473–1482 (2015)

    Google Scholar 

  22. Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 3389–3396. IEEE (2017)

    Google Scholar 

  23. Guadarrama, S., Riano, L., Golland, D., Go, D., Jia, Y., Klein, D., Abbeel, P., Darrell, T., et al.: Grounding spatial relations for human-robot interaction. In: IROS, pp. 1640–1647 (2013)

    Google Scholar 

  24. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV, pp. 2980–2988. IEEE (2017)

    Google Scholar 

  25. Helmert, M.: The fast downward planning system. JAIR 26, 191–246 (2006)

    Article  MATH  Google Scholar 

  26. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  27. Hofmann, T., Niemueller, T., Lakemeyer, G.: Initial results on generating macro actions from a plan database for planning on autonomous mobile robots. In: ICAPS (2017)

    Google Scholar 

  28. Hornung, A., Böttcher, S., Schlagenhauf, J., Dornhege, C., Hertle, A., Bennewitz, M.: Mobile manipulation in cluttered environments with humanoids: integrated perception, task planning, and action execution. In: Humanoids, pp. 773–778 (2014)

    Google Scholar 

  29. Johnson, J., Krishna, R., Stark, M., Li, L.J., Shamma, D., Bernstein, M., Fei-Fei, L.: Image retrieval using scene graphs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3668–3678 (2015)

    Google Scholar 

  30. Karkus, P., Hsu, D., Lee, W.S.: QMDP-net: Deep learning for planning under partial observability. In: NIPS, pp. 4697–4707 (2017)

    Google Scholar 

  31. Lee, N., Choi, W., Vernaza, P., Choy, C.B., Torr, P.H., Chandraker, M.: Desire: distant future prediction in dynamic scenes with interacting agents. In: CVPR, pp. 336–345 (2017)

    Google Scholar 

  32. Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(39), 1–40 (2016)

    MathSciNet  MATH  Google Scholar 

  33. Lin, T., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: ECCV 2014, pp. 740–755 (2014)

    Chapter  Google Scholar 

  34. Littman, M.L., Sutton, R.S.: Predictive representations of state. In: NIPS, pp. 1555–1561 (2002)

    Google Scholar 

  35. Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv:1508.04025 (2015)

  36. Luong, M.T., Sutskever, I., Le, Q.V., Vinyals, O., Zaremba, W.: Addressing the rare word problem in neural machine translation. arXiv:1410.8206 (2014)

  37. Mauro, L., Alati, E., Ntouskos, V., Pirri, F.: Help by predicting what to do. In: IEEE International Conference on Image Processing (ICIP 2019) (2019)

    Google Scholar 

  38. Mauro, L., Alati, E., Ntouskos, V., Pirri, F., Izadpanahkakhk, M., Omrani, E.: Anticipation and next action forecasting in video: an end-to-end model with memory. arXiv preprint arXiv:1901.03728 (2019)

  39. Mauro, L., Alati, E., Sanzari, M., Ntouskos, V., Massimiani, G., Pirri, F.: Deep execution monitor for robot assistive tasks. In: ECCV, ACVR Workshop, pp. 158–175 (2018)

    Google Scholar 

  40. McFadden, D., et al.: Conditional logit analysis of qualitative choice behavior (1973)

    Google Scholar 

  41. Mendoza, J.P., Veloso, M., Simmons, R.: Plan execution monitoring through detection of unmet expectations about action outcomes. In: ICRA, pp. 3247–3252 (2015)

    Google Scholar 

  42. Mesnil, G., Bordes, A., Weston, J., Chechik, G., Bengio, Y.: Learning semantic representations of objects and their parts. Mach. Learn. 94(2), 281–301 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  43. Mesnil, G., Rifai, S., Bordes, A., Glorot, X., Bengio, Y., Vincent, P.: Unsupervised learning of semantics of object detections for scene categorization. In: Pattern Recognition Applications and Methods, pp. 209–224 (2015)

    Google Scholar 

  44. Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  45. Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning. ICML 2016, 1928–1937 (2016)

    Google Scholar 

  46. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  47. Norouzi, M., Bengio, S., Jaitly, N., Schuster, M., Wu, Y., Schuurmans, D., et al.: Reward augmented maximum likelihood for neural structured prediction. In: NIPS, pp. 1723–1731 (2016)

    Google Scholar 

  48. Ntouskos, V., Sanzari, M., Alati, E., Freda, L., Pirri, F.: Visual search and recognition for robot task execution and monitoring. In: Applications of Intelligent Systems: Proceedings of the 1st International APPIS Conference 2018, vol. 310, p. 94. IOS Press (2018)

    Google Scholar 

  49. Pan, J.Y., Yang, H.J., Faloutsos, C., Duygulu, P.: Gcap: Graph-based automatic image captioning. In: CVPRW 2004. Conference on Computer Vision and Pattern Recognition Workshop, p. 146. IEEE (2004)

    Google Scholar 

  50. Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 16–17 (2017)

    Google Scholar 

  51. Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: ICML, vol. 2017 (2017)

    Google Scholar 

  52. Pathak, D., Mahmoudieh, P., Luo, G., Agrawal, P., Chen, D., Shentu, Y., Shelhamer, E., Malik, J., Efros, A.A., Darrell, T.: Zero-shot visual imitation. In: ICLR (2018)

    Google Scholar 

  53. Pei, M., Jia, Y., Zhu, S.C.: Parsing video events with goal inference and intent prediction. In: 2011 International Conference on Computer Vision, pp. 487–494. IEEE (2011)

    Google Scholar 

  54. Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

    Google Scholar 

  55. Pereyra, G., Tucker, G., Chorowski, J., Kaiser, Ł., Hinton, G.: Regularizing neural networks by penalizing confident output distributions. arXiv:1701.06548 (2017)

  56. Petrick, R.P., Bacchus, F.: PKS: knowledge-based planning with incomplete information and sensing. In: ICAPS (2004)

    Google Scholar 

  57. Ranzato, M., Chopra, S., Auli, M., Zaremba, W.: Sequence level training with recurrent neural networks. In: ICLR (2017)

    Google Scholar 

  58. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)

    Google Scholar 

  59. Sermanet, P., Lynch, C., Chebotar, Y., Hsu, J., Jang, E., Schaal, S., Levine, S.: Time-contrastive networks: self-supervised learning from video. arXiv:1704.06888 (2018)

  60. Shivashankar, V.: Hierarchical goal networks: formalisms and algorithms for planning and acting. Ph.D. thesis, University of Maryland, College Park (2015)

    Google Scholar 

  61. Singh, S., Jaakkola, T., Littman, M.L., Szepesvári, C.: Convergence results for single-step on-policy reinforcement-learning algorithms. Mach. Learn. 38(3), 287–308 (2000)

    Article  MATH  Google Scholar 

  62. Sohn, S., Oh, J., Lee, H.: Multitask reinforcement learning for zero-shot generalization with subtask dependencies. arXiv:1807.07665 (2018)

  63. Sun, W., Venkatraman, A., Boots, B., Bagnell, J.A.: Learning to filter with predictive state inference machines. In: ICML, pp. 1197–1205 (2016)

    Google Scholar 

  64. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: NIPS, pp. 3104–3112 (2014)

    Google Scholar 

  65. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn., vol. 1 (1998, 2017)

    Article  MATH  Google Scholar 

  66. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press, Cambridge (2018)

    MATH  Google Scholar 

  67. Tensorflow: Tensorflow models (2018). https://github.com/tensorflow/models/

  68. Venugopalan, S., Rohrbach, M., Donahue, J., Mooney, R., Darrell, T., Saenko, K.: Sequence to sequence-video to text. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4534–4542 (2015)

    Google Scholar 

  69. Wilkins, D.E.: Recovering from execution errors in sipe. Comput. Intell. 1(1), 33–45 (1985)

    Article  Google Scholar 

  70. Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057 (2015)

    Google Scholar 

  71. Yamada, T., Murata, S., Arie, H., Ogata, T.: Representation learning of logic words by an RNN: from word sequences to robot actions. Front. Neurorobotics 11, 70 (2017)

    Article  Google Scholar 

  72. Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489 (2016)

    Google Scholar 

  73. You, Q., Jin, H., Wang, Z., Fang, C., Luo, J.: Image captioning with semantic attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4651–4659 (2016)

    Google Scholar 

  74. Zhang, X., Xie, G., Liu, C., Bengio, Y.: End-to-end online writer identification with recurrent neural network. IEEE Trans. Hum. Mach. Syst. 47(2), 285–292 (2017)

    Article  Google Scholar 

  75. Zhu, L., Xu, Z., Yang, Y., Hauptmann, A.G.: Uncovering the temporal context for video question answering. IJCV 124(3), 409–421 (2017)

    Article  MathSciNet  Google Scholar 

  76. Zhu, Y., Gordon, D., Kolve, E., Fox, D., Fei-Fei, L., Gupta, A., Mottaghi, R., Farhadi, A.: Visual semantic planning using deep successor representations. CoRR (2017)

    Google Scholar 

Download references

Acknowledgments

The research has been granted by the H2020 Project Second Hands under grant agreement No. 643950.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Edoardo Alati , Lorenzo Mauro , Valsamis Ntouskos or Fiora Pirri .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Alati, E., Mauro, L., Ntouskos, V., Pirri, F. (2020). Anticipating Next Goal for Robot Plan Prediction. In: Bi, Y., Bhatia, R., Kapoor, S. (eds) Intelligent Systems and Applications. IntelliSys 2019. Advances in Intelligent Systems and Computing, vol 1037. Springer, Cham. https://doi.org/10.1007/978-3-030-29516-5_60

Download citation

Publish with us

Policies and ethics