Elsevier

Neurocomputing

Volume 421, 15 January 2021, Pages 140-150
Neurocomputing

MGRL: Graph neural network based inference in a Markov network with reinforcement learning for visual navigation

https://doi.org/10.1016/j.neucom.2020.07.091Get rights and content

Abstract

Visual navigation is an essential task for indoor robots and usually uses the map as assistance to providing global information for the agent. Because the traditional maps match the environments, the map-based and map-building-based navigation methods are limited in the new environments for obtaining maps. Although the deep reinforcement learning navigation method, utilizing the non-map-based navigation technique, achieves satisfactory performance, it lacks the interpretability and the global view of the environment. Therefore, we propose a novel abstract map for the deep reinforcement learning navigation method with better global relative position information and more reasonable interpretability. The abstract map is modeled as a Markov network which is used for explicitly representing the regularity of objects arrangement, influenced by people activities in different environments. Besides, a knowledge graph is utilized to initialize the structure of the Markov network, as providing the prior structure for the model and reducing the difficulty of model learning. Then, a graph neural network is adopted for probability inference in the Markov network. Furthermore, the update of the abstract map, including the knowledge graph structure and the parameters of the graph neural network, are combined into an end-to-end learning process trained by a reinforcement learning method. Finally, experiments in the AI2THOR framework and the physical environment indicate that our algorithm greatly improves the success rate of navigation in case of new environments, thus confirming the good generalization.

Introduction

A visual navigation task has practical significance in various indoor robots [1], [2], in which the agent learns to make a series of action decisions for finding a given target only based on visual input. The visual input is the observation image of the current state by the camera on the agent, which is usually a partial observation of the environment. The agent makes a series of decisions to find the target by the partial observation, known as the partially observable Markov decision process (POMDP). When the target is not in the observation image, the agent should take a lot of trials and errors to explore the environment for locating the target.

Including the global location information of the environment, the map is regarded as a useful tool for the agent to understand the entire environment and locate the target [3], [4], [5]. Many map-based navigation methods use known maps as reference system to provide crucial location information for navigation [6], [7]. However, these methods fail in the new environment when the map is unknown. To solve this, the map-building-based navigation methods, such as simultaneous localization and mapping (SLAM) or concurrent mapping and localization (CML) [8], [9], building a map before navigation, are proposed. Although these methods allow an agent to navigate in the new environments, the complex map-building process, including exploration, localization, and map generalization processes [9], limits their application scenarios, especially the changeable indoor environments. Actually the matching of the traditional map and the environment location is the reason why the map-based and map-building-based navigation methods fail and limit in new environments.

In recent years, non-map-based navigation methods, especially the deep reinforcement learning based methods which are not using a map as an assist and often learn a deep network to make navigation decision [10], [11], [12]. These methods mainly utilize the excellent image representation ability of deep network and achieve state-of-the-art performance in navigation [11], [12]. However, without the help of map, these data-driven methods lack interpretability and the global view of the environment. So we consider to offer an abstract map to the deep reinforcement learning navigation method which can break the traditional map limitation and utilize the powerful representation ability of the deep network.

In our paper, we propose Markov network for modeling the environment as an abstract map. Because in different indoor environments, the object positions are arranged by people activities which imply a certain regularity such as chairs next to the desk, cups on the desk. As this regularity guides people’s actions when searching in the new environment, it inspires us to build a model for the regularity to assist agent navigation in the new environments. Considering the regularity of relative position relationships and the uncertainty of the environments, the regularity can be supposed to subject to a certain probability distribution with the random vector to represent the object positions and the joint probability to represent the regularity of relative position relationships. Obviously, the probability distribution of the object positions is complex and contains multiple random vectors which is hard to learn. Because Markov network is one of the probabilistic graphical models which can compactly encode a complex probability distribution over a high-dimensional space by a graph-based representation [13]. Therefore, we consider to build a Markov network for expressing the objects position distribution of different environments as an abstract map.

Different from traditional maps containing specific locations and visual features, the Markov network consists of random vectors as nodes and the direct probabilistic interactions between the nodes as edges. In our model, the Markov network represents relative distance between objects and describes the position in terms of a random distribution, therefore, the abstract map modeled by the Markov network is applicable to more varieties of environments than traditional maps.

By modeling the environment as a Markov network, we propose an intelligent system (hereafter, named as MGRL) to realize the intelligent navigation decision for agent. MGRL is named by the three critical components in constructing the intelligent system: representation as the Markov network, inference by a Graph neural network, and learning under a reinforcement learning framework.

For the representation, to further make learning easily, the structure of a knowledge graph is introduced as the prior of the Markov network. For the probability inference, the graph neural network (GNN) is selected as it has an excellent ability to process graph structured data. For the learning, considering that the deep reinforcement learning performs well in navigation, we propose a reinforcement learning method to determine the weights of the model and updating the graph structure on the basis of interaction with the environment.

By the environment modeled by a Markov network, inference with a GNN, learning conducted under the Reinforcement Learning algorithm, the intelligence system MGRL improves the generalizability of the baseline method and outperforms a benchmark method with a high success rate in new environments of the simulation environment AI2THOR [14].

In summary, the contributions are itemized as follows.

  • 1) We propose a reinforcement learning method combined with a Markov network (MGRL), which represents the environment by an uncertain model.

  • 2) MGRL method combines probability inference and structure learning by reinforcement learning, further improving the ability of the baseline method to adapt to different environments.

  • 3) The interpretability of the proposed method is demonstrated through experimental data analysis and theoretical analysis.

This paper is structured as follows. Section 2 briefly describes the previous research on navigation and probability inference by GNNs. Section 3 illustrates the baseline advantage actor-critic (A2C) method for visual navigation. Section 4 details the construction of MGRL method. Section 5 presents the experiments and the theoretical analysis to reveal the interpretability of the method in improving the generalization ability for various environments. Finally, Section 6 presents the conclusion and addresses the future work.

Section snippets

Related work

Visual navigation is a fundamental task related to autonomous movement of robots, and it has been extensively researched for several decades [10], [15], [16], [11]. We present a brief review of related work in the following aspects: deep reinforcement learning methods in navigation, the Markov network in navigation, and GNNs for probability inference in navigation.

Problem formulation and the baseline method

In the navigation to a target with semantic labels, the agent is required to reach the target by visual observation. We propose a method that uses a deep reinforcement learning framework for helping agents make correct navigation decision and reach variable targets from sequential decisions in different types of indoor scenes. Reinforcement learning uses a policy function for the decision on “where to go”, p(at|st,θ), usually written as πθ(at|st), where at is the action of step t and st is the

MGRL for navigation

With the framework of A2C, the structure of MGRL is shown in Fig. 1. Different from the previous approach, besides the visual feature, MGRL considers a novel feature, named as the graph relational feature, which is deduced by the graph module based on the Markov network, shown in the dashed box of Fig. 1. The Markov network is modeled for an abstract map of the environment, which is with the probability inference process implemented by a GNN-based model.

As shown in Fig. 1, the inputs of the A2C

Experiments

First, to evaluate the effectiveness of the proposed MGRL method, it is compared with the baseline A2C method and the random algorithm. Then, to investigate the generalizability of the proposed method, comparative experiments are conducted with Yang’s benchmark method [12]. To further explore the mechanism of MGRL, ablation experiment are conducted with only the Markov network. Furthermore, the variable visualization of the intermediate output is analyzed to explore the interpretability of MGRL.

Conclusion

We propose MGRL for visual navigation by introducing Markov network to model the environment. The proposed MGRL uses reinforcement learning and adopts a GNN to predict the joint probability on the Markov network. The joint probability predicts the relative distance between two objects. In addition, the structure of the graph is initialized by a knowledge graph, and the observed relations are used to update the graph structure, which helps the graph module dynamically adapt to the various

CRediT authorship contribution statement

Yi Lu: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Writing - original draft, Visualization. Yaran Chen: Methodology, Writing - review & editing, Project administration. Dongbin Zhao: Methodology, Resources, Writing - review & editing, Project administration, Funding acquisition. Dong Li: Conceptualization, Software.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

The authors would like to thank Yanqiang Li for his supports in providing the Robot, and thank Prof. Jun Wang, Qichao Zhang, Junwen Chen and Shuxian Jiang for their helpful comments and discussions.

Yi Lu received the B.E. and M.E. degrees in Mathematics from ShanDong University, China, in 2009 and 2012, respectively. She is currently pursuing the Ph.D. degree in Institute of Automation, Chinese Academy of Sciences, China. Her research interests include Graph neural network, deep reinforcement learning and computer vision.

References (46)

  • S. Thrun

    Learning metric-topological maps for indoor mobile robot navigation

    Artif. Intell.

    (1998)
  • J.-A. Meyer et al.

    Map-based navigation in mobile robots

    Cogn. Syst. Res.

    (2003)
  • Y. Zhou et al.

    Hybrid hierarchical reinforcement learning for online guidance and navigation with partial observability

    Neurocomputing

    (2019)
  • F. Bonin-Font et al.

    Visual navigation for mobile robots: a survey

    J. Intell. Rob. Syst.

    (2008)
  • J. Kim et al.

    Vision-based location positioning using augmented reality for indoor navigation

    IEEE Trans. Consum. Electron.

    (2008)
  • O. Khatib

    Real-time obstacle avoidance for manipulators and mobile robots

    Auton. Robot Veh.

    (1986)
  • J. Borenstein et al.

    Real-time obstacle avoidance for fast mobile robots

    IEEE Trans. Syst. Man Cybern.

    (1989)
  • K.O. Arras et al.

    Feature extraction and scene interpretation for map-based navigation and map building

    Mobile Robots XII

    (1998)
  • B. Yamauchi, A. Schultz, W. Adams, Mobile robot exploration and map-building with continuous localization, in: IEEE...
  • N. Lama, B. Sen, K. Gautam, Survey: visual navigation for mobile robot, in: International Conference on Computing and...
  • Y. Zhu, R. Mottaghi, E. Kolve, J.J. Lim, A. Gupta, L. Fei-Fei, A. Farhadi, Target-driven visual navigation in indoor...
  • A. Mousavian, A. Toshev, M. Fišer, J. Košecká, A. Wahid, J. Davidson, Visual representations for semantic target driven...
  • W. Yang, X. Wang, A. Farhadi, A. Gupta, R. Mottaghi, Visual semantic navigation using scene priors, in: International...
  • D. Koller et al.

    Probabilistic Graphical Models: Principles and Techniques

    (2009)
  • E. Kolve, R. Mottaghi, W. Han, E. VanderBilt, L. Weihs, A. Herrasti, D. Gordon, Y. Zhu, A. Gupta, A. Farhadi, Ai2-thor:...
  • D. Zhao et al.

    Deep reinforcement learning with visual attention for vehicle classification

    IEEE Trans. Cogn. Develop. Syst.

    (2017)
  • N. Zeng et al.

    Path planning for intelligent robot based on switching local evolutionary pso algorithm

    Assembly Autom.

    (2016)
  • H. Li, Q. Zhang, D. Zhao, Deep reinforcement learning-based automatic exploration for navigation in unknown...
  • K. Shao, D. Zhao, N. Li, Y. Zhu, Learning battles in vizdoom via deep reinforcement learning, in: IEEE Conference on...
  • K. Shao, D. Zhao, Y. Zhu, Q. Zhang, Visual navigation with actor-critic deep reinforcement learning, in: 2018...
  • K. Shao et al.

    Starcraft micromanagement with reinforcement learning and curriculum transfer learning

    IEEE Trans. Emerg. Top. Comput. Intell.

    (2019)
  • S. Gupta, J. Davidson, S. Levine, R. Sukthankar, J. Malik, Cognitive mapping and planning for visual navigation, in:...
  • D. Li, D. Zhao, Q. Zhang, Y. Zhuang, B. Wang, Graph attention memory for visual navigation, in: arXiv preprint, 2019....
  • Cited by (22)

    • Human-Guided Reinforcement Learning With Sim-to-Real Transfer for Autonomous Navigation

      2023, IEEE Transactions on Pattern Analysis and Machine Intelligence
    View all citing articles on Scopus

    Yi Lu received the B.E. and M.E. degrees in Mathematics from ShanDong University, China, in 2009 and 2012, respectively. She is currently pursuing the Ph.D. degree in Institute of Automation, Chinese Academy of Sciences, China. Her research interests include Graph neural network, deep reinforcement learning and computer vision.

    Yaran Chen, is an Assistant Professor with the Institute of Automation, Chinese Academy of Sciences, China. She received the Ph.D. degree from Institute of Automation, Chinese Academy of Sciences in 2018. She is currently an assistant professor at The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China. Her research interests include deep learning, neural architecture search, deep reinforcement learning and autonomous driving.

    Dongbin Zhao (M’06-SM’10-F’20) received the B.S., M.S., Ph.D. degrees from Harbin Institute of Technology, Harbin, China, in 1994, 1996, and 2000 respectively. He was a postdoctoral fellow at Tsinghua University, Beijing, China, from 2000 to 2002. He has been a professor at Institute of Automation, Chinese Academy of Sciences since 2002, and also a professor with the University of Chinese Academy of Sciences, China. From 2007 to 2008, he was also a visiting scholar at the University of Arizona. He has published 6 books, and over 90 international journal papers. His current research interests are in the area of deep reinforcement learning, computational intelligence, autonomous driving, game artificial intelligence, robotics, smart grids, etc.

    Dr. Zhao serves as the Associate Editor of IEEE Transactions on Neural Networks and Learning Systems, IEEE Transactions on Cybernetics, IEEE Transactions on Artificial Intelligence, IEEE Computation Intelligence Magazine, etc. He is the chair of Distinguished Lecture Program, and was the Chair of Technical Activities Strategic Planning Sub-Committee (2019), the Chair of Beijing Chapter (2017–2018), Adaptive Dynamic Programming and Reinforcement Learning Technical Committee (2015–2016), Multimedia Subcommittee (2015–2016) of IEEE Computational Intelligence Society (CIS). He works as several guest editors of renowned international journals. He is involved in organizing many international conferences. He is an IEEE Fellow.

    Dong Li received the B.S. degree in automation from Central South University, Hunan, China, in 2014, and the Ph.D. degree with the State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China, in 2019. His current research interests include reinforcement learning, autonomous driving, and deep learning.

    ☆This work is supported partly by the National Natural Science Foundation of China (NSFC) under Grants No. 61803371, the Beijing Science and Technology Plan under Grants Z191100007419002, and No. GJHZ1849 International Partnership Program of Chinese Academy of Science.

    View full text